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Foreword 


The 16th International Conference on Human-Computer Interaction, HCI 
International 2014, was held in Heraklion, Crete, Greece, during June 22-27, 
2014, incorporating 14 conferences/thematic areas: 


Thematic areas: 


Human-Computer Interaction 
Human Interface and the Management of Information 


Affiliated conferences: 


11th International Conference on Engineering Psychology and Cognitive 
Ergonomics 

8th International Conference on Universal Access in Human-Computer 
Interaction 

6th International Conference on Virtual, Augmented and Mixed Reality 
6th International Conference on Cross-Cultural Design 

6th International Conference on Social Computing and Social Media 

8th International Conference on Augmented Cognition 

5th International Conference on Digital Human Modeling and Applications 
in Health, Safety, Ergonomics and Risk Management 

Third International Conference on Design, User Experience and Usability 


e Second International Conference on Distributed, Ambient and Pervasive 


Interactions 
Second International Conference on Human Aspects of Information Security, 
Privacy and Trust 


e First International Conference on HCI in Business 
e First International Conference on Learning and Collaboration Technologies 


A total of 4,766 individuals from academia, research institutes, industry, and 
governmental agencies from 78 countries submitted contributions, and 1,476 pa- 
pers and 225 posters were included in the proceedings. These papers address 
the latest research and development efforts and highlight the human aspects of 
design and use of computing systems. The papers thoroughly cover the entire 
field of human-computer interaction, addressing major advances in knowledge 
and effective use of computers in a variety of application areas. 


This volume, edited by Randall Shumaker and Stephanie Lackey, contains 


papers focusing on the thematic area of virtual, augmented and mixed reality, 
addressing the following major topics: 


VAMR in education and cultural heritage 
Games and entertainment 


VI 


Foreword 


Medical, health and rehabilitation applications 
Industrial, safety and military applications 


The remaining volumes of the HCI International 2014 proceedings are: 


Volume 1, LNCS 8510, Human-Computer Interaction: HCI Theories, 
Methods and Tools (Part I), edited by Masaaki Kurosu 

Volume 2, LNCS 8511, Human-Computer Interaction: Advanced Interaction 
Modalities and Techniques (Part IT), edited by Masaaki Kurosu 

Volume 3, LNCS 8512, Human-Computer Interaction: Applications and Ser- 
vices (Part III), edited by Masaaki Kurosu 

Volume 4, LNCS 8513, Universal Access in Human-Computer Interaction: 
Design and Development Methods for Universal Access (Part I), edited by 
Constantine Stephanidis and Margherita Antona 

Volume 5, LNCS 8514, Universal Access in Human-Computer Interaction: 
Universal Access to Information and Knowledge (Part II), edited by 
Constantine Stephanidis and Margherita Antona 

Volume 6, LNCS 8515, Universal Access in Human-Computer Interaction: 
Aging and Assistive Environments (Part ITI), edited by Constantine Stephani- 
dis and Margherita Antona 
Volume 7, LNCS 8516, Universal Access in Human-Computer Interaction: 
Design for All and Accessibility Practice (Part IV), edited by Constantine 
Stephanidis and Margherita Antona 

Volume 8, LNCS 8517, Design, User Experience, and Usability: Theories, 
Methods and Tools for Designing the User Experience (Part I), edited by 
Aaron Marcus 

Volume 9, LNCS 8518, Design, User Experience, and Usability: User Expe- 
rience Design for Diverse Interaction Platforms and Environments (Part II), 
edited by Aaron Marcus 

Volume 10, LNCS 8519, Design, User Experience, and Usability: User Expe- 
rience Design for Everyday Life Applications and Services (Part ITI), edited 
by Aaron Marcus 

Volume 11, LNCS 8520, Design, User Experience, and Usability: User 
Experience Design Practice (Part IV), edited by Aaron Marcus 

Volume 12, LNCS 8521, Human Interface and the Management of Informa- 
tion: Information and Knowledge Design and Evaluation (Part I), edited by 
Sakae Yamamoto 

Volume 13, LNCS 8522, Human Interface and the Management of Infor- 
mation: Information and Knowledge in Applications and Services (Part II), 
edited by Sakae Yamamoto 

Volume 14, LNCS 8523, Learning and Collaboration Technologies: Designing 
and Developing Novel Learning Experiences (Part I), edited by Panayiotis 
Zaphiris and Andri Ioannou 

Volume 15, LNCS 8524, Learning and Collaboration Technologies: 
Technology-rich Environments for Learning and Collaboration (Part ID), 
edited by Panayiotis Zaphiris and Andri Ioannou 


Foreword Vil 


Volume 16, LNCS 8525, Virtual, Augmented and Mixed Reality: Designing 
and Developing Virtual and Augmented Environments (Part I), edited by 
Randall Shumaker and Stephanie Lackey 

Volume 18, LNCS 8527, HCI in Business, edited by Fiona Fui-Hoon Nah 
Volume 19, LNCS 8528, Cross-Cultural Design, edited by P.L. Patrick Rau 
Volume 20, LNCS 8529, Digital Human Modeling and Applications in Health, 
Safety, Ergonomics and Risk Management, edited by Vincent G. Duffy 
Volume 21, LNCS 8530, Distributed, Ambient, and Pervasive Interactions, 
edited by Norbert Streitz and Panos Markopoulos 

Volume 22, LNCS 8531, Social Computing and Social Media, edited by 
Gabriele Meiselwitz 

Volume 23, LNAI 8532, Engineering Psychology and Cognitive Ergonomics, 
edited by Don Harris 

Volume 24, LNCS 8533, Human Aspects of Information Security, Privacy 
and Trust, edited by Theo Tryfonas and Ioannis Askoxylakis 

Volume 25, LNAI 8534, Foundations of Augmented Cognition, edited by 
Dylan D. Schmorrow and Cali M. Fidopiastis 

Volume 26, CCIS 434, HCI International 2014 Posters Proceedings (Part I), 
edited by Constantine Stephanidis 

Volume 27, CCIS 435, HCI International 2014 Posters Proceedings (Part IT), 
edited by Constantine Stephanidis 


I would like to thank the Program Chairs and the members of the Program 
Boards of all affiliated conferences and thematic areas, listed below, for their 
contribution to the highest scientific quality and the overall success of the HCI 
International 2014 Conference. 

This conference could not have been possible without the continuous support 
and advice of the founding chair and conference scientific advisor, Prof. Gavriel 
Salvendy, as well as the dedicated work and outstanding efforts of the commu- 
nications chair and editor of HCI International News, Dr. Abbas Moallem. 

I would also like to thank for their contribution towards the smooth organi- 
zation of the HCI International 2014 Conference the members of the Human-— 
Computer Interaction Laboratory of ICS-FORTH, and in _ particular 
George Paparoulis, Maria Pitsoulaki, Maria Bouhli, and George Kapnas. 
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Abstract. In this paper we propose a novel interaction technique that creates the 
illusion of tactile exploration of museum artefacts which are otherwise impossi- 
ble to touch. The technique meets the contextual necessity, often requested by 
museum curators, to background technology and to direct the focus of the mu- 
seum Visitor’s experience to the artefact itself. Our approach relies on the com- 
bination of haptic interaction and the adaptation of a well-known illusion that 
enables museum visitors to make sense of the actual physical non-touchable ar- 
tefact in an embodied way, using their sensory and motor skills. We call this 
technique Haptic Augmented Reality. 


Keywords: Museum, haptics, touch, authenticity, haptic augmented reality. 


1 Introduction 


Touch is part of a larger complex of senses which interrelates mental and bodily 
processes, the haptic sense. Haptic exploration is a fundamental experience that as- 
sists people in perceiving and making sense of the physical world around them. The 
sensory information of the museum exhibits, particularly surface texture and material, 
is particularly important for museum visitors since the artefacts themselves are the 
center of the social, educative and entertaining experience of a museum visit. Whilst 
the value of touch experiences can be debated there is a growing literature on sensory 
engagement in museums which seeks to redress the imbalance which has traditionally 
allowed the visual sense to dominate [10] [14]. 

The emphasis on touch experiences in heritage settings and museums has emerged 
as a distinctive trend from this exploration [6] [24] alongside discussions of sensory 
perceptions of materiality as social constructs within both past societies and our own 
with the two not necessarily coinciding [15]. The value of touch has thus received 
nuanced debate within museums studies and has been explored as a related set of 
sensory concepts [23]. A feature of the role of touch has been the emotional connec- 
tions of objects and people and the charisma of objects where it is possible to see 
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ancient artefacts displayed in museums as having an extended object biography bring- 
ing them into our contemporary cultural context [10] [16]. 

Within digital technologies and computer applications a number of views and 
directions have emerged but the heritage sector in general is seeing a range of develop- 
ments in the applications of haptic and virtual presentations of objects within museums 
[11] [5] [8]. In the networking cluster described more fully below the concerns of 
heritage sector curators, exhibitions officers and conservators was not so much on the 
value of adding touch experiences to the museum experience but on how to balance 
curating the objects whilst providing touch experiences. The charisma of objects and the 
desire of people to touch them were acknowledged. Well-known objects were seen as 
particularly problematic. That is why the focus of the installations discussed here was 
one of the Lewis chess pieces as these objects are amongst the most popular artefacts in 
the whole of the collections within the National Museums Scotland. 


Zz Design Process 


Virtual handling of museum artefacts lies within a complex context of different pro- 
fessional practices, technological development and end-user needs. As part of the 
Science and Heritage programme funded by EPSRC-AHRC, Linda Hurcombe led an 
international project bringing researchers from different disciplines into a networking 
cluster focused on “Touching the Untouchable: increasing access to archaeological 
artefacts by virtual handling’. 

It was therefore appropriate to adopt a design led user-centred approach that would 
bring the many experts involved in a creative dialogue. The interaction technique we 
present in this paper was one of the outcomes of two design-led workshops that took 
place for two days each over the period of six months. Many of the key-issues that are 
related to curatorial practice and technological development were described and dis- 
cussed in the first workshop, and prototype ideas were developed and presented in a 
second workshop six months later. From the first meeting of this group it was evident 
that there were multiple issues faced by the heritage sector and many potential ideas 
for solutions. 

The workshops involved 26 participants from 19 institutions and 6 countries. 
Disciplines included archaeology, conservation and curation together with art and 
interaction design, computer science and haptic human-computer interfaces. Repre- 
sentatives from small and national museum collections, artifact specialists, the 
National Trust, Historic Palaces, the Royal National Institute for the Blind attended 
and all presented different points of emphasis offering a richly textured insight into 
professional practices. The transdisciplinary nature of the first workshop allowed key 
issues to be raised and discussed from a plethora of perspectives, while design ses- 
sions involved participants in collaborative hands-on work and cultivated a number of 
ideas that were developed as first prototypes and evaluated in the second workshop. 

On the first day participants gave short position presentations on their work and 
the key issues as they saw them. There were also demonstrations of museum speci- 
mens and haptic technology. The second day consisted of a plenary session where 
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stakeholders discussed a broad range of themes and opportunities arising from the 
previous day. Topics included haptic device capabilities, archaeology research agen- 
das, curation and end users and the potential benefits of virtual handling. 


Key issues that were raised included: 


e Haptic installations may deflect interest away from the ancient items on show both 
physically and conceptually. 

e The technology for virtual touch needs not to overwhelm its physical setting e.g. a 
museum gallery and must be able to cope with the visitor numbers (i.e. size of ma- 
chine, noise, ease of use) 

e Can haptic experiences get away from the computer desktop? 

e Products and solutions could be expected to be diverse according to the kind of 
user and their setting. Rapid-prototyping could be explored for its practical issues 
and scope. 

e Financial mechanisms for public display varied and virtual technologies need to be 
assessed against the robustness of device and costs to set up and expertise to main- 
tain them. 


The focus of the present paper is on two of many more prototypes which were devel- 
oped in response to these key issues and which were presented in the second work- 
shop for testing and evaluation. They were well received by the stakeholders and after 
some corrections were made, they were deployed and evaluated in two museums: the 
National Museum of Scotland in Edinburgh and the Orkney Museum in Kirkwall. 
These evaluations and prototypes flowed from the first networking grant which pur- 
sued them to proof of concept stage. More recent work was undertaken as part of a 
second grant also led by Hurcombe within the Science and Heritage programme 
which allowed them along with some of the other ideas to be given more extensive 
public trials and development. The full range of installations developed is covered 
elsewhere [17] but here the focus is on one famous object presented in two contrast- 
ing ways. 


3 The Prototypes 


The museum exhibit that was used is an iconic 12th century Scottish artefact known 
as the Lewis Queen chess. The artefact is displayed in the National Museum of Scot- 
land behind a glass case. Both prototypes use the same visual illusion but employ 
different media, one digital and one non-digital. The visual illusion is borrowed from 
the theatre tradition and is called Pepper’s Ghost. A large sheet of glass is placed 
between the audience and the stage. A ghostly image of an actor below the stage is 
then projected onto the glass giving the illusion that a ghost is on stage with the ac- 
tors. Using the Pepper’s Ghost illusion we employed two different media, a 3-D 
printed replica of the chess piece and a haptic device. Both used the glass of the mu- 
seum case itself as a reflective surface thus ensuring that the focus was the real object 
or the haptic experience. 
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The replica was created by laser scanning the original artefact, mirroring it to the 
original (i.e. lateral inversion of the scan data before printing) so that the user’s ex- 
perience would match the object in the case. The need to present mirror images as 
part of virtual reality and co-location interfaces is part of virtual reality issues [3] [18]. 
The replica was then painted black so that it would absorb light thereby reducing its 
reflection in the glass case. The replica is placed facing the chess piece at an equal 
distance from the display glass (Fig.1). When a user places her hands on the replica 
and concentrates her gaze at the original piece behind the glass, she can see her hands 
reflected in the glass apparently touching the real artefact in the display case (Fig.2). 
Because she sees the actual artefact (and her hands) and touches the replica she ex- 
periences the sensation that she is actually touching the artefact itself. The illusion is 
further strengthened by placing a cover over the replica to shield it from the user’s 
direct gaze. This cover also contains a light to illuminate the user’s hands so that their 
reflection is brighter. 


Fig. 1. The Lewis Chess piece behind the glass and the mirrored 3-D printed replica 


The second prototype uses the same illusion but employs a Sensable™ Omni 6DoF 
haptic device instead of the user’s hands. The haptic device is placed outside the dis- 
play case and positioned towards the left of where the replica was so that the reflec- 
tion of the pen-like stylus of the haptic device is positioned close to the artefact in the 
display glass. Instead of a replica, a haptic model created from the laser scan of the 
artefact is algorithmically positioned into the haptic device’s workspace at an equal 
distance from the display case (Fig.3). The haptic version is invisible but the model 
can be traced and felt in the physical space by moving the stylus using the same com- 
bined visual and haptic feedback as with the replica prototype. 
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Fig. 2. Visitor interaction with the replica. Her gaze is concentrated at the original artefact 
behind the glass 


This is a novel way of using a haptic device for immersing museum visitors into a 
deep understanding of the museum exhibits. In [19] museum visitors explore the 
surface of a digital daguerreotype case from the collection of the Natural History 
Museum of Los Angeles County. Similarly, the Haptic Museum, developed in the 
University of Southern California, is a haptic interface that allows museum visitors to 
examine virtual museum artefacts with a haptic device [21]. The 'Museum of Pure 
Form’ is a virtual reality system that allows the user to interact with virtual models of 
3-D art forms and sculptures using haptic devices and small-scale haptic exoskeletons 
[1,2]. The Senses in Touch II, which was installed in the Hunterian Museum in Glas- 
gow, was designed to allow blind and partially-sighted museum visitors, particularly 
children, to feel virtual objects in the collection via a PC and Wingman haptic mouse 
[13]. The projects described above have used detailed virtual models of the museum 
artefacts and allowed the visitor to explore them with the haptic technology. Our goal 
was to diverge from the computer screen, and use the haptic technology in a way that 
evokes direct haptic interaction with the physical artefact without actually touching it 
providing the illusion of doing so. 

Equally, the Pepper’s ghost technique has been used in the Virtual Showcase, a 
mirror-based interface for viewing real artefacts augmented with virtual geometry [3]. 
In Virtual Showcase no touch is used as the focus is on adding additional virtual ob- 
jects and other elements onto the real object. ARToolkit-based optical tracking is used 
for tracking the user’s head movement in real time to ensure collocation with the vir- 
tual components. Head tracking was important in the Virtual Showcase because of the 
virtual geometry. In our prototype no tracking of the head is required. As long as the 


8 M. Dima, L. Hurcombe, and M. Wright 


VISUAL FOCUS 


displayease 


\ invisible 


haptic model 


Fig. 3. The haptic device prototype 


replica or the invisible haptic model are placed in exactly the same orientation and 
same distance from the surface of the glass case, the illusion of co-location is pre- 
served under all translations and rotations of the viewing angle which preserve a di- 
rect line of sight from the viewpoint through the case wall to the real artefact. 


3.1 Embodiment and Sense-Making 


Our interaction with the world around us is embodied and multi-modal and we make 
sense of the world by enacting in it. Enactive knowledge is direct, in the sense that it 
is natural and intuitive, based on the perceptual array of motor acts. The goal of both 
prototypes was to create an embodied and immersive experience for the visitors in 
order to provide a sense of authenticity for the ancient artifact. Embodiment and si- 
tuated cognition places interaction at the center of meaning making and extends the 
concept of mind to include the body and environment [25], [7], [22]. 
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The illusion of manipulation that the presented interaction technique creates can be 
explained as directly analogous to a classic experiment in situated cognition and per- 
ceptual plasticity [20]. A subject places their hand on a table hidden behind a parti- 
tion. On the other side of the partition they can see a dummy hand. Their real hand 
and dummy hand are touched by fingers of a researcher which are synchronized to 
touch in the same place at the same time but the subject can only see the finger that 
touches the dummy hand not the one that touches their real hand. After a short time 
the subject perceives the dummy hand to be their own. In the two prototypes, the vis- 
ual stimulus of the researchers visible finger is replaced by the reflection of the users 
own hands or haptic probe. The synchronized haptic stimulus is provided by the sub- 
jects’ fingers touching the replica or felt through the haptic device when it collides 
with the invisible virtual 3d model of the artefact. The haptic device enables haptic 
exploration and sense-making through multi-modal motor action which makes them 
an enactive interface [12]. The combination of haptic and visual feedback in both 
prototypes enriches the array of senses during the interactive experience and creates 
more dimensions of embodiment than having only visual cues. 


4 Evaluation 


The replica based prototype is created using digital technology of laser scanning, 3D 
modelling and rapid prototyping but is itself a non-digital tangible interface. It offers 
a simple, robust and apparently technology free interaction backgrounding technology 
entirely. The haptic device is a digital interface and uses the same laser scan to build 
its virtual haptic model. Our intention is to compare these two prototypes in a real 
museum setting. 

It is a challenge to evaluate user experience that is closely related to embodied, ta- 
cit understandings, such as in this case. The evaluation goals concern subjective opi- 
nions of visitor focus, degree of engagement and phenomenological experience. As 
these goals are subjective and not easily mapped to any objective quantifiable factor 
our evaluation was a combination of gathering feedback, verbal or written, and close 
observation of the way the visitors used the interface, their gestural motions as well as 
the social interaction among them during their experience. 

We were present at all time next to the exhibits and were interacting with the 
visitors, observing their interaction with the artefact through both interfaces, having 
informal discussions about their experience, often using probing questions, and trig- 
gering their reflection when something would break their experience. The visitors 
were then asked if they would like to fill in a qualitative questionnaire. They were 
then asked to rate with a 5 point Likert Scale (from strongly disagree, disagree, neu- 
tral, agree or strongly agree) the following statements: 


1. It is important to engage more senses than the visual as part of the museum expe- 
rience. 

2. The installation/replica gave a sense of how the ancient object would feel. 

3. The installation/replica was straightforward to use. 
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4. Because of the installation/replica there was a better understanding of the ancient 
objects. 
5. Overall, the installation/replica enhanced the museum experience. 


They were also prompted to write any more detailed comments. 

Both prototypes attracted a number of visitors most of whom were eager to discuss 
with the researchers and learn more about the project as well as its technical details. A 
large number of visitors would stay for a considerable time and explore the possibili- 
ties of the interface, giving us verbal feedback and discussing details of its use. As 
only one person at a time could use each prototype, visitors would gather around and 
watch, conversing with one another and with the researchers until it was their turn. 
Children were particularly drawn to the installations and they were the ones to stay 
longer and explore it. This shows that both interfaces can enable playful engagement. 

There were 60 questionnaire responses over the two days installation at the Na- 
tional Museum of Scotland and the two days at the Orkney Archaeological Museum. 
The initial empirical results demonstrated the potential of both prototypes to provide a 
novel embodied experience of untouchable artefacts. Visitors’ comments were very 
positive for both prototypes but particularly for the replica. 

The rapid prototype installation successfully produced the sense of haptic explora- 
tion of the chess piece in a natural and simple way. The setup synchronised the visi- 
tors’ visual and haptic cues, and consequently, their interaction with the replica was 
directly translated as interaction with the real statue. One visitor commented ‘As I felt 
it, I felt like I was touching the one in the reflection and not the replica’. Another one 
said that it “feels real and that you feel more connected to its history’. 

One drawback of the replica installation was the double image of the hands on the 
glass created by the refraction of the light on the perplex glass. A few visitors found 
this a bit distracting, though not detrimental to the whole experience. The double im- 
age can be corrected in future versions by calculating optical parameters based on a 
specific position where the visitor will be standing. Another interesting comment 
made by three visitors was that the texture of the replica should be improved to match 
as much as possible the material of the original piece. This would improve the percep- 
tion of the exhibited piece and will be taken forward in future designs. 

The main drawback of the haptic interface that was reported from the discussions 
and written comments was that the haptic device could not provide a detailed outline 
of the statue. Most visitors could not easily perceive the fine details of the statue with 
the stylus. One reason for this was the size and detail of the exhibit. The installation 
could work very well for larger objects or small objects with little details. The lack of 
precision can be slightly improved by developing a more sensitive collision detection 
system between the haptic device controller and the haptic geometry which allows for 
more detailed tracing of the carved details. One of the future tests is to use an artefact 
with few details and compare user responses, both verbal and bodily, with those re- 
ceived in this study. The aim will be to investigate the extent to which the interface 
conveys sufficient realism starting from relatively simple objects. The lack of detailed 
information was also attributed to the single-point contact of the device compared to 
the multi finger touch of the hands. 
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5 Discussion 


The visitors agreed unanimously that the combination of visual and haptic cues gave a 
much better sense of the object, and increased the sense of authenticity in comparison 
to just viewing it in its case. Because of the size of the chess piece, some visitors 
commented that a seated position would bring the piece on their eye height and re- 
duce the fatigue from standing up and using the interface. 

The comparison between the haptic device and the replica showed that the multi- 
finger tactile interaction with the replica produced considerably richer information 
than the single-point contact of the haptic device. The surface texture and material of 
an artefact plays a significant role when exploring haptically an object and the Phan- 
tom haptic device cannot provide this level of haptic rendering at sufficient quality. 
The authors acknowledge that a single contact point is relatively little information 
compared to the amount of haptic sensations that a finger can give by touching a tex- 
ture or a complex surface. The key issue is whether small additions to the museum 
experience are worthwhile. 

The two different installations allow comparative assessments on such issues 
which add to the current literature on heritage applications of haptic and virtual expe- 
riences with objects. Our research juxtaposed two different experiences to the same 
object allowing for direct comparisons. The relative costs, maintenance issues, and 
ease of use, as well as the visitor feedback and comments all pointed in favour of the 
computer mediated but physical replica compared to the active haptic device. Yet 
without the trial this was not a predictable outcome as the readiness of visitors to en- 
gage with the virtual reflection and the coalignment of visual hand image with touch 
experience was one of the key trial results. In contrast, the haptic pen could have 
been handled by visitors in much the same way as a simple wooden stick could be 
drawn across the face of a textured object to probe aspects of its morphology and 
textures. 

The trial results certainly relate to cognitive perception but they also relate back to 
the clear directive of the end-users: to hide the technology and for it not to overwhelm 
the visitors. Visitors were clearly more comfortable aligning real touch of a hidden 
replica co-located with a virtual reflection than working an obvious computer-related 
largely unfamiliar device. Though the design of the pen was fairly robust and easy to 
use as a device once shown, not many visitors knew about haptic pens and the device 
by its nature could not be hidden. These are important aspects in the willingness of 
visitors to engage with unfamiliar technologies versus their desire to interact with 
objects within glass cases. Such results have been highlighted in other research [5] 
[11] reinforcing our conclusion that the familiarity of the touch experience at the level 
of embodied practice can affect visitor perceptions but that as haptic devices are de- 
veloped and become more mainstream experiences they can more easily be applied. 
Still, these statements are based on observing the visitors' readiness to engage with the 
installations and from some visitor comments about preferences between the two. It is 
more difficult to attribute this to familiarity versus immediacy which the replica 
presents stronger than the haptic pen whether produced by 3D printing or other 
means. 
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Finally, the 3D print technologies are reducing in price and are now not so expen- 
sive. Compared to using a PHANToM, the 3D printed replica is a lower cost solution. 


6 Further Developments 


Further innovations have been explored within the project to give better textures and 
tactile qualities such as weight and more personal interactions but these raise many 
other issues which require fuller discussion elsewhere [17]. 

The results from the comparison between the two interfaces indicated that the hap- 
tic device prototype provides a less complex sensation of the artefact. However, there 
is ample scope for the use of haptic devices within this setup if the interaction me- 
chanism is enriched with dynamic elements and other modalities, for example with 
sound feedback, extra touchable geometry, explanations, and texture all of which can 
be dynamic, personalized information. While this can be equally possible with the 
replica (e.g. by using depth cameras to calculate the hands position), the implementa- 
tion through a haptic device is much easier and cost-effective. 

Another development that is particularly for the haptic device prototype is to use 
the interface with museum artefacts that have missing parts as the device can be used 
to feel the invisible missing piece. In addition, a draw function can be implemented 
through which users can draw extra geometry. In [9] an early research on this process 
is presented, and the Virtual Showcase [3] that was mentioned in the literature review 
also allows the presentation of stereoscopic images overlaid on top of real objects. We 
envisage that this study will have numerous applications in museum research as well 
as learning. 


7 Conclusion 


We have presented a novel haptic interaction paradigm which gives the impression of 
direct haptic interaction with museum artefacts in their display cases. The prototypes 
solve the problem of technology taking focus from the artefact as attention is not on a 
graphic display or replica but the real artefact itself. The approach was tested in a real 
Museum environment and was found to provide enhanced engagement with a real 
precious artefact. Compared to the digital prototype, the non-digital conveyed richer 
sensory information about the artefact during interaction. However, the digital inter- 
face offers the opportunity for easily adding extra interactive elements that can en- 
hance immersion. While much remains to be done, our work shows that the technique 
we developed has the potential of becoming a useful way of evoking multimodal em- 
bodied exploration of intangible artefacts, with significant educative and economic 
advantages for museums and similar exhibition and learning spaces. 
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1 


The implementation of new technology in the teaching field has been largely ex- 
tended to all types of levels and educational frameworks. However, these innovations 
require approval, validation and evaluation by the final users, the students. A second 
step of the proposal (that will be generated in the first semester of 2014) will be to 
discuss the advantages and disadvantages of applying mixed evaluation technology in 
a case study of the use of interactive and collaborative tools for the visualization of 
3D architectonic models. We will use a mixed-method of evaluation based on quantit- 
ative and qualitative approaches to measure the level of motivation and satisfaction 
with this type of technology and to obtain adequate feedback that allows for the opti- 
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Abstract. This work aims to design an academic experience involving the im- 
plementation of an augmented reality tool in architecture education practices to 
improve the motivation and final marks of the student. We worked under differ- 
ent platforms for mobile devices to create virtual information channels through 
a database associated with 3D virtual models and any other type of media con- 
tent, which are geo-located in their real position. The basis of our proposal is 
the spatial skills improvement that students can achieve using their innate af- 
finity with user-friendly digital media such as smartphones or tablets, which al- 
low them to visualize educational exercises in real geo-located environments 
and to share and evaluate students’ own-generated proposals on site. The 
proposed method aims to improve the access to multimedia content on mobile 
devices, allowing access to be adapted to all types of users and contents. The 
students were divided into various groups, control and experimental, in respect 
of the function of the devices and activities to perform. The goal they were 
given was to display 3D architectural geo-referenced content using SketchUp 
and ArMedia for iOS and a custom platform or Android environment. 


Keywords: Augmented reality, e-learning, geo-e-learning, urban planning, 
educational research. 


Introduction 


mization of this type of experiment in future iterations. 
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The current paper is based on three main pillars: The first pillar focuses on teach- 
ing innovations within the university framework that cultivate higher motivation and 
satisfaction in students. The second pillar concerns how to implement such an innova- 
tion; we propose the utilization of determinate tools (AR) of so-called Information 
Technologies (IT), so that students, as “digital natives,” will be more comfortable in 
the learning experience. Finally, the study will employ a mixed analysis method to 
concretely obtain the most relevant aspects of the experience that should be improved 
both in future interactions and in any new technological implementations within a 
teaching framework. 


2 Background 


Augmented reality (AR) technology is based on overlapping virtual information in 
real space. AR technology makes it possible to mix virtual objects generated by com- 
puters with a real environment, generating a mixed environment that can be viewed 
through any technological device in real time. The main characteristics of an aug- 
mented reality system are [1]: 


e Real-time interactivity 
e Use of 3D virtual elements 
e Mix of virtual elements with real elements 


Augmented reality has emerged from research in virtual reality. Virtual reality en- 
vironments make possible total immersion in an artificial three-dimensional (3D) 
world. The involvement of virtual reality (VR) techniques in the development of edu- 
cational applications brings new perspectives to engineering and architectural de- 
grees. For example, through interaction with 3D models of the environment, the 
whole construction sequence in time and space of a deck can be simulated for stu- 
dents’ better understanding [2]. We can also explore hidden structure through ghosted 
views within the real-world scenes [3] or find several examples of AR and VR ap- 
plied to monitoring the maintenance of new buildings and to preserve cultural herit- 
age [4-6]. 

Evaluating the use of VR or AR applications in an industrial setting is a complex 
task, but some statistics suggest performance improvements of up to 30%, with in- 
volved employees reporting higher levels of engagement [7]. Applications of AR that 
support technicians in the field have the potential to reduce costs by up to 25% 
through quicker maintenance or component substitution, identification and setup of 
new connections, solution of faults and misconfigurations, with less burden on back- 
end personnel and system resources. 


2.1. Recent Improvements in Mobile Learning 


Between 2008 and 2009, new platforms and paradigms emerged to propel AR devel- 
opment in smartphones, such as Junaio, Layar and Wikitude. All of these companies 
embraced a new concept that consisted in creating an augmented reality browser with 
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Fig. 1. AR at the UPC University. A 3D model visualized through a mobile device screen 
thanks to the camera detection of a regular shape or code. 


a number of features that allowed developers to produce AR content according to a 
specific set of rules, and, finally, enabled end-users to view computer generated ele- 
ments superimposed on the live camera view of common smartphones. These AR 
browsers are compatible with most mobile operating systems, such as Android, the 
iPhone OS, or the Symbian. 

A framework in which this technology could potentially be used in more interesting 
ways is the representation and management of territory, because real scenes could be 
“completed” with virtual information. This method would facilitate a greater awareness 
and better understanding of the environment, especially if used in the educational 
framework. Last year research at universities worldwide focused on the development of 
AR applications (AGeRA[1], GIS2R[8], ManAR[9]), tools (GTracer for libGlass[10]), 
educational platforms (TLA[11]), or open resources and contents ISEGINOVA AR 
Project[12]) such as 3D architectural models (3D ETSAB AR[13-14]). 


2.2. GIS Limitations 


Real-time performance and qualitative modeling remain highly challenging, and in 
situ 3D modeling has become increasingly prominent in current AR research, particu- 
larly for mobile scenarios [15]. The main problem of all these applications seems to 
be the location or geographical information, because a Geographic Information Sys- 
tem (GIS) is needed to provide, manage and filter public queries with different levels 
of accuracy and upgradeable information. In short, we need to link a 3D model to a 
database that contains all the necessary information associated with it. Furthermore, 
the introduction of new learning methods using collaborative technologies offers new 
opportunities to provide educational multimedia content. 
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While GPS (Global Positioning System) has satisfactory accuracy and performance 
in open spaces, its quality deteriorates significantly in urban environments. Both 
the accuracy and the availability of GPS position estimates are reduced by shadowing 
from buildings and signal reflections. Mobile AR applications for outdoor applica- 
tions largely rely on the smartphone GPS. Also, GPS provides the user position based 
on triangulation of signals captured from at least 3 or 4 visible satellites by a 
GPS receiver. Standard GPS systems have 5m to 30m accuracy due to limitations 
such as [16]: 


e Being unavailable (or slow in obtaining position) when satellite signals are absent 
(such as underground), and when meteorological conditions block transmission, 
and 

e Satellites can provide erroneous information about their own position. 


Already well known applications are Wikitude, Nokia City Lens, Google Goggles 
and Metaio Junaio. Today’s sensors’ capabilities in stability and precision have noti- 
ceably improved. For example, GPS accuracy is increased with differential GPS or 
DGPS, which brings the accuracy of readings to within 1-3 meters of the object, as 
compared to the 5-30 meters of normal GPS. DGPS works using a network of statio- 
nary GPS receivers [17]. The difference between their predefined position and the 
position as calculated by the signals from satellites gives the error factor. This error 
component is then transmitted as an FM signal for the local GPS receivers, enabling 
them to apply the necessary correction to their readings. 


2.3. TICS at University 


Recently, experiences of the implementation of TIC in university degrees concluded 
that “digital natives” with a periodical activity on networks and chats are better stu- 
dents [18]. The use of VR technologies on practical courses for graduate and under- 
graduate student’s aims to develop personal skills [19] introduced in the European 
Educational Space (EEES), such as a methodical approach to practical engineering 
problems, teamwork, working in interdisciplinary groups and time management. 

In previous publications [20-21] we explained the impact of mobile learning AR 
technologies introduced in engineering degrees on the academic results of our stu- 
dents, having found that they increased their motivation and satisfaction in classroom. 


3 Case of Study 


This item presents a teaching methodology for a practical course in architectural de- 
gree where the students improve AR and VR technologies through their own mobile 
devices. The course design follows previous examples [23] of moodle-based evalua- 
tion systems for the actual requirements within EEES on new skills for professional 
technicians such as spatial vision, orientation or teamwork. 

At the same time, to test the accuracy and satisfaction of GPS systems only availa- 
ble in smartphones and iOS devices, we developed an Android tool (RA3) based on 
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Fig. 2. On the left is an iOS screen displaying options with ArPlayer of Armedia. On the right 
is the application RA3 developed for Android devices. 


markers as location encoders (i.e. markers with regular shapes, such as QR code-like 
markers) associated with specific points of the environment or objects. 


3.1 Methodology 
The proposed course focused on two points: 


e On the one hand, the structure that defines the acquisition of knowledge is an in- 
verted pyramid: students cannot perform an activity without having completed and 
assimilated the activity before. Therefore, only students who have built a 3D model 
will be able to insert it into a landscape or photograph according to its geometrical 
space or perspective. Similarly, only smartphone owners are able to play AR appli- 
cations for iOS platforms. To separate mobile device users from the rest of the 
class, all students completed a pre-test that defined two main groups; a control and 
experimental group. 

e On the other hand, the work of the students with the proposed methodology, not 
only helps them to improve their spatial skills (to be able to compare their 3D pro- 
posals located and displayed in its location, allowing understand and correct com- 
mon design errors in particular focused on the size of the models) but this work al- 
so improves the educational proposal identifying strengths and weaknesses from 
the usability of the method. 
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During the designed course at the Architecture University of Barcelona (ETSAB- 
UPC), four main exercises were developed in order to evaluate particular skills linked 
to architectural and engineering careers, such as spatial perception, orientation or 
occlusion. These kinds of abilities can also be introduced with specific AR expe- 
riences [24]. 


3.2 Contents 


The first activity of the course was to generate a database of 3D sculptures of Andreu 
Alfaro. These virtual sculptures, in the second part of the course, then had to be inte- 
grated in a nineteenth-century square through a photographic refund. The third exer- 
cise was the virtual representation of the chosen architectural environment, one of the 
few arcaded squares of Barcelona, the Plaza Masadas. Finally, every student pro- 
moted their own urban intervention according to the regulation and urban plans. 


Fig. 3. Two examples of photographic proposals of 3D sculptures in the middle Plaza Masadas, 
Barcelona 


In the photographic proposals of the object or piece in the middle of a square, the 
realism of the image can be diminished if the ambient occlusion or point of view of 
both images (the real square and the 3D sculpture) is in contradiction. Lighting, for 
example, is an element of realism that is dynamic and produces shadows that, when 
missing; break the realistic effect of AR. To avoid ambient occlusion contradictions, 
the students were required to select several properties such as color, reflection or ma- 
terial, and use tools that introduced the latitude and light-time during the render 
process of 3D models in Artlantis, V-ray or 3DStudioMax to offer more interactive 
real environment [25]. Then, Photomatch options of SketchUp were used to match the 
3D model in the chosen square’s photography according to its point of view. 

The third part of the practical course introduced teamwork abilities into the previous- 
ly evaluated skills of geometric performing, spatial visualization or orientation and am- 
bient occlusion. Different segments of the existing arcaded buildings around the square 
had to be developed in two partner groups separately according the urban plans that ex- 
pected the reconstruction of one corner of this place. The more or less extensive adjust- 
ments undertaken to connect every segment with the entire compilation determined the 
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Fig. 4. 3D model of the section of an arcaded square in Barcelona 


first mark of the group, with the second mark coming from the result of a controlled ex- 
am in which every student had to represent a part of a similar arcaded square in 3D. 

The fourth exercise implemented physical and urban properties in the main 3D 
model. A personal approach was required that discussed material, color, landscaping 
and urban furniture in the proposed space. The grade for this project was obtained 
from two perspectives rendered in a human point of view. 

Before the final exercise an experimental group composed of students who had 
passed the “digital natives” pre-test, have worked using AR with two location strate- 
gies for 3D models, marker-based and GPS location. Evaluating the academic results 
obtained finally by the students, it became clear this experience enabled an improve- 
ment in their spatial abilities, as intended. The two main platforms for mobile devices, 
Android and iOS (ArPlayer and RA3) determined the location strategy for each user 
in order to integrate their own project on its real environment. Placing the 3D model 
in its real environment, the application displays different options of interaction such 
as rotation, scale and light-orientation. Playing with application choices, the student 
should obtain a final scene with his device in order to compare it with his previous 
virtual representations and exercises. 


Fig. 5. Rendering of two projects in a human point of view 
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4 Conclusions 


The teaching methodology explained in connection with this item is thought to intro- 
duce our grade students to virtual and hand-held augmented reality (HHAR) to supe- 
rimpose virtual models on real scenes. Having previously developed test methods to 
confirm the motivation of our students to work with VR and AR technologies, our 
next point will be to determine the best resources and systems to introduce these tech- 
niques in the educational community. 

In later papers the implementation of this methodology in a practical course at the 
Architecture University of Barcelona (ETSAB) will give us information about ad- 
vances and users’ results about different issues: 


e VR software and rendering 
e AR applications 
e GIS (geographical information) systems on mobile devices 


Computer graphics have become much more sophisticated, becoming more realis- 
tic. In the near future, researchers plan to display graphics on TV screens or computer 
displays and integrate them into real-world settings. Therefore, geometrical formula- 
tion of 3D architecture for virtual representation is now possible with 3D SketchUp, 
Rhinoceros or Autocad due their compatibilities in DBX or DWG files to generate a 
database. 

In the field of architecture, virtual reality rendering requires several options for 
ambient occlusion such as color, reflection or material, using tools and files allowing 
the introduction of the latitude and light-time. Based on these premises we will work 
with Artlantis, V-ray or 3DStudioMax to offer more interactivity with real-world 
environment. 
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Fig. 6. Geographic information channel linked on a 3D model 
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The implementation of AR can be explored in various areas of knowledge, contribut- 
ing significantly in education. It provides great potential in the creation of interactive 
books, allowing intuitive and easy to learn interaction. Developing on our previous 
experiences using AR applications we decided to use ArMedia (iOS) and to develop a 
new application for Android, RA3. The major difference between the two platforms 
that display AR services is the GIS (geographical information system): iOS works 
with GPS and Android needs a marker based on regular shapes (i.e. QR codes) as 
location encoders. GPS systems are not currently accurate enough to aid in the teach- 
ing of architecture. Therefore, in case of urban planning it is recommended to replace 
the GPS for location based on shapes or QR codes. 


Fig. 7. Comparison of composed images of different students from the experimental group and 
the process to adapt the proposal in a correct size 


Fig. 8. Student proposals with compositions more similar using AR 
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Analyzing the experience, and accepting that we are in the first feasibility study 
phase of the methodology to be implemented, the first conclusion of the exercise is 
that the students detect the correct point of view after placing the 3D geometry in the 
scene from its photo-composition. In other words, they are not capable of interpreting 
the information from the EXIF file, a situation that can lead to a great disparity of data 
because of the lack of homogeneity in the sensors. With this procedure, the angle of 
vision of a flat monocular image is reduced (with a relatively closed field ranging 
between 40—45°), very differently from the panoramic field that has a human user. For 
this reason, the proposed sculptures are smaller, as it has happened with the students 
that carried out the experiment, it will be necessary to adjust them in the final step 
using the RA. In relative terms, the increase in the size of the sculptures has been 
around 25%, once the students were located in situ and they were able to see the size 
of the square firsthand. This adjustment has been similar in both the iOS devices and 
Android, and whether their screens were 4 or 7 inches, which means that the size of 
the screen it is not significant. 

Regarding the use of markers, six works were delivered: two were with markers 
and four geo-referenced. All students described some relative difficulties for fine 
adjustment of the models, although these were not insurmountable. On the other hand, 
the initial location of the object was considered easier using the mark, after which the 
students proceeded to move, rotate and scale the model on its final location. The only 
disadvantage is that it must always be visible in the scene. 

For the students who used geo-referencing, the most difficult initial step was to lo- 
cate the object in the square given the lack of accuracy of the mobile phones GPS, 
which forced them to move through the square in addition to adjusting the height in 
relation to the observer. The best way to facilitate this first approach is to use a QR 
code on the location to download the model. 

To conclude, we can affirm that the experiment is viable and, if we can corroborate 
these results in the future with a big sample of users, we will be able to affirm that 
these experiments are the proof of the suitability of the method to solve these types of 
problems of urban design. Similarly, initially we can affirm that the students felt com- 
fortable and were very motivated with this type of experiment in comparison with 
traditional classes, involving themselves for more hours than expected, which gener- 
ated quality work and consequently an increase in their qualifications that are current- 
ly being evaluated. 
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Abstract. The paper at hand describes the necessity of developing didactically 
designed Virtual Reality (VR) based learning environments. Changing industri- 
al processes triggered by the fourth industrial revolution will influence working 
and learning conditions. VR based learning environments have the potential to 
improve the understanding of complex machine behavior. The paper describes 
possibilities for the investigation and documentation of expert knowledge as 
a crucial source for developing VR scenarios. The consideration of learning 
objectives and the current state of the learners know how are essential for 
designing an effective learning environment. The basic theoretical approaches 
of didactics and their application to virtual learning environments will be pre- 
sented with an example for the maintenance of a high voltage circuit breaker. 
Finally experiences from the practical use will be reflected and next steps on the 
way to a user specific learning environment will be discussed. 


Keywords: Virtual Reality, Maintenance, Expert knowledge, learning theory, 
learning objectives. 


1 Motivation 


German industry is on the threshold of the fourth industrial revolution, also called 
“Industry 4.0”. It is driven by the increasing integration of internet technologies with 
traditional industries such as manufacturing. That will lead to a more and more auto- 
nomous production process. 

The core of the revolution is the complete penetration of the industry, its products 
and its services with software while products and services are connected via the inter- 
net and other networks. This change leads to new products and services that change 
the life and work of all people. [1] 
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The technological changes are accompanied by the demographic development. [2] 
Experienced expert workers will retire while there are not enough well qualified 
young people. The challenge of companies is the integration of the experiential know- 
ledge with the latest knowledge and information of the young professionals. A main 
part of experiential knowledge is tacit knowledge [3], which means that people are 
not aware of their experience and have difficulty verbalizing it. The first section of the 
paper at hand will give a short overview of how this knowledge can be investigated. 

Changing industrial processes will also change the maintenance process. Mainten- 
ance technicians will take a responsible role within production with the objective of 
minimizing downtimes of the machine and by planning procedures under the aspect 
of resource and energy efficiency. 

The vocational training of maintenance technicians involves facing the challenge 
that many machines are not available for training, cannot be used because of danger- 
ous processes or are hardly comprehensible. Other challenges arise due to insufficient 
visibility of important assembly groups or the increase of invisible network processes. 
The understanding of processes and the resulting confidence is a prerequisite for safe 
and efficient maintenance procedures. [4] 

Technology based learning environments that are based on virtual 3D models can 
overcome the restrictions of today’s learning methods. For planning and arranging 
technology based learning environments in a target-oriented way, didactical designs 
are essential. Didactics describes a system of conditions and interdependent decisions 
which demonstrate all factors of teaching and learning in a target-oriented practice. In 
this context didactics refers to the following criteria: identification of learning objec- 
tives, the content of learning, application of methods, media and the pedagogical field 
and where teaching and learning is situated. All points interdepend and have to be 
considered in the respective context. [5-6] 

The paper at hand will describe the conceptual design of such a learning environ- 
ment under the aspect of didactics. The above mentioned didactical aspects and their 
application to the learning environment will be presented. An example of the main- 
tenance of a high voltage circuit breaker serves as a descriptive example. 


2 The Role of Expert Knowledge for Maintenance Processes 


There are many tasks in the production process that cannot solely be taught in class- 
room trainings as these tasks are very complex and require the ability of decision 
making. This kind of problem solving competence can only be gained within the 
working process and goes along with experiential knowledge. In the maintenance 
process you will find comparatively easy tasks that follow a defined checklist and 
don’t require special know how. On the other hand there are very complex tasks, e.g. 
the failure analysis and the ability of reflecting and comparing the current situation 
with similar experiences in the work life. [7] 

“A main part of this experiential knowledge is the so called tacit knowledge. Tech- 
nicians are often not aware of this special knowledge what becomes noticeable when 
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they can’t verbalize their knowledge. So, tacit knowledge is more than just the syste- 
matically received knowledge within vocational education, it is the result of the work- 
ers technical handling in their everyday work life. [8-9] [10] 

For the design of VR based learning environments it is therefore relevant to take 
experiential knowledge into account and to offer technicians opportunities to make 
experiences in the virtual world. 

The challenge of bringing experiences to VR is to investigate the tacit knowledge 
as it is related to persons and situations. Narrative methods have revealed the potential 
to receive valuable knowledge from experts by telling stories. One method that is 
applied by the authors is the so called triad interview [11]. 

“Tt is characterized by a locally and timely defined dialogue about an agreed topic 
where three persons with very specific roles take part: 


e The narrator as the technical expert for the topic is responsible for the validity of 
the knowledge. 

e The listener as the novice technician who wants to learn from the expert is respon- 
sible for the usefulness of the knowledge. 

e The technical layperson who is the methodical expert and moderator and who is 
responsible for the comprehensibility. “ [10] 


So far triad interviews were documented in texts which are not optimal for its applica- 
tion within the organization due to the limited connectivity of novice technicians to 
written texts. Virtual Reality has the potential to keep the narrative structure of stories 
and is therefore very well suited to transfer experiential knowledge whereby allowing 
an easy access for novice users. 


3 Theoretical Basis for Learning in Virtual Worlds 


We already emphasized the importance of making tacit knowledge of experts explicit 
and to make use of the potential of VR for its documentation. In this section we start 
by identifying basic learning theories for a suitable didactic design of virtual learning 
environments. Learning environments should provide certainty of action, especially in 
dangerous situations and activities which require a high level of competence. If a 
proper prototype does not yet exist, a qualification is already possible in the process 
of developing. 

VR based learning environments can be classified to Leontjevs activity theory [12]. 
Following this theory from 1977, knowledge of employees is not only represented in 
their heads, but also in their working activities. In 1987 Engestrém [13] extended this 
theory with aspects of learning and development processes. It results in the so called 
activity system (Fig. 1) which contains the subject (e.g. the acting technician), the 
object (e.g. a maintenance task) and the integration to a community of practice (e.g. a 
group of experts for the maintenance of a special device). Furthermore it is embedded 
to an organization with its rules and values that influence the handling and decision 
making of employees. All parameters of the activity system influence the outcome. 
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The quality of the outcome can be improved by assigning a well-designed didactical 
learning setting that is represented in the triangle of subject, object and mediating 
artifacts. 

The paper at hand focuses on this triangle of the activity system, describes basic 
didactic theories from vocational education and puts them in relation to learning theo- 
ries with respect to their application in VR based learning environments. 


Mediating artifacts 
Tools and signs 


> Object ==> Outcome 


Rules Community Division of 
labor 


Fig. 1. “Activity- and learning theory” according to Engestrém 1987 [13] 


Based on the current state of the learner’s knowledge, the learning objectives are de- 
fined. Therefore the following aspects have to be taken into account: learner, types of 
knowledge, learning objectives, the learning content and organizational structures 
within the company. 


3.1. The Learner in the Context of a Community 


According to Dreyfus and Dreyfus [14] becoming an expert in a domain is highly 
dependent upon a developmental progression from novice to advanced beginner to 
expertise. 

The authors identify “(...) five stages of competence development and the four cor- 
responding developmental learning areas”. (They) “have a hypothetical function for 
the identification of thresholds and stages in the development of occupational compe- 
tence and identity” [15]. Though, they also have a didactic function in the develop- 
ment of work-related and structurally oriented vocational courses. 

When considering a person whose skills are developing from deficient to compe- 
tent Lave and Wenger [16] state that the quality of their learning situation becomes 
crucial to the learning outcome. [15]. The authors point out that learning as a path 
from inability to ability is accomplished as a process of integration into the communi- 
ty of practice of those who already demonstrate expertise. [15] 
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3.2. Knowledge Types 


There are many different terminologies to differentiate between types of knowledge. 
Anderson et al. [17] identify in their taxonomy of learning outcomes four major cate- 
gories of knowledge relevant across all disciplines: 


1. Factual knowledge, 

2. Conceptual knowledge, 

3. Procedural knowledge and 
4. Metacognitive knowledge. 


Factual knowledge consists of the basic elements. It includes knowledge of specific 
facts and terminology (bits of information). Conceptual knowledge refers to more 
general concepts and is based on the interrelation of basic elements within a larger 
structure that enable them to function together. It includes knowledge of categories, 
principles and models. [17] 

Both, factual and conceptual knowledge constitute knowledge of “what”. The two 
other types — procedural and metacognitive knowledge — constitute knowledge of 
"how to" [18]. Procedural knowledge ranges from completing routine exercises to 
solve new problems and includes methods of enquiring information, knowing proce- 
dures and criteria for using skills, algorithms, techniques and methods. Metacognitive 
knowledge implies “knowledge of cognition in general as well as awareness and 
knowledge of one’s own cognition" [17]. It includes knowledge of general strategies, 
that might be used for different tasks, within diverse conditions, and the knowledge of 
the extent to which the strategies are effective [19]. From Anderson's [17] perspective 
all these types of knowledge play complementary roles in processes of problem solv- 
ing. A further definition in this context is the work process knowledge. It describes a 
type of knowledge that guides practical work and, as contextualized knowledge, goes 
far beyond non-contextual theoretical knowledge. [cf. Eraut et al., 1998 at [15]] A 
characteristic of practical work process knowledge is the mastery of unpredictable 
work tasks, fundamentally incomplete knowledge (knowledge gap) in relation to non- 
transparent, non-deterministic work situations. This is a special feature of vocational 
work. Meta-competence can be created, namely the ability to cope with the know- 
ledge gap while solving unpredictable tasks and problems in vocational work [15]. 


3.3. Learning Objectives 


A learning objective describes intended behavior as well as special knowledge, skills 
and attitudes of the learner, which are caused by educational activities. The newly 
developed behavior has to be observable and verifiable. Learning objectives refer to 
three domains: cognitive (knowledge), affective (attitude or self) and psychomotoric 
(skills). 

In 1956 Benjamin Bloom created the taxonomy of learning domains, a classifica- 
tion system for learning objectives in order to promote higher forms of thinking in 
education. The cognitive domain includes knowledge and the development of intellec- 
tual abilities. [20] 
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Bloom's taxonomy involves six major categories, which are demonstrated in the 
following illustration, ranging from the simplest behavior to the most complex one: 


constructivism 


[Analysis 


Comprehension 


Knowledge 


Fig. 2. Allocation of Blooms learning taxonomy to the main learning theories 


The application of the learning taxonomy itself does not ensure the didactical imple- 
mentation. First of all it describes the cognitive process of gaining knowledge. 
Together with the existing knowledge about learning theories and the analysis of 
learning environments the taxonomy is essential for a professional didactical design 
of technology based learning environments. [21] The following section describes the 
characteristics of the learning theories behaviorism, cognitivism and constructivism 
and their relation to Blooms taxonomy. 


Behaviorism. Behaviorist-oriented forms of learning are suitable for simple learning 
processes that trigger a stimulus-response connection, which leads to permanent 
change in behavior. [22] This learning theory can be assigned to the lower levels of 
Blooms learning taxonomy, e.g. for learning facts or hand grips. 


Table 1. Behaviorism — Pros and Restrictions 


Behaviorism 


Restrictions 


This approach is suitable only for the trigger | Situations that require a lot of knowledge 
associations for simole forms of learning. and are complex cannot be conveyed with 
(e.g. learning vocabulary, factual this approach. 
knowledge) 


Example: Facts about the technical 


equipment (e.g. facts, parameters, ...) 


Cognitivism. The cognitive approach focuses on the mental processing of informa- 
tion to knowledge. Every learning process is an active construction of knowledge. 
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The interaction between external information supply and the mental processes of rec- 
ognition requires the learner to incorporate prior knowledge with new one. [23] 


Table 2. Cognitivism — Pros and Restrictions 


Cognitivism 


ee 


This approach is suitable for integrating This approach gives no ability to reflect or 
information ina goal-oriented process in proof the learning context and it doesn’t 


order to develop cognitive maps. Learning consider the social aspect of learning 
content can be adjusted automatically to 

the user's skills. (@. g. model learning, 

training videos, ...). 


Example: Visualization of best-practice | ee 

solutions for a maintenance task. 
Constructivism. Constructivism is an epistemology founded on the assumption that, 
by reflecting our experiences, we construct an individual understanding of the world 
we live in. Each of us generates own "rules" and "mental models," which are used to 
make sense of ones experiences.” In essence, a constructive learning environment 
provides real-world or problem-based learning situations that are focused on authentic 
learning [24]. The theory of situated learning is a partial aspect of constructivism; it 
claims that every idea and human action is a generalization, adapted to the ongoing 


environment. From this perspective learning situations are characterized by complex, 
multi-perspective and problem-containing requirements. 


Table 3. Constructivism — Pros and Restrictions 


Constructivism 


se 


VR reveals the potential for the The application needs to be designed 
implementation of ‘learning by doing’ due carefully with respect of the user's level of 
to its interactive characteristic. This allows knowledge: the high degree of freedom 


users to make experiencesin a safe might be challenging for novice users as 
environment, independentlyfrom the they can feel lost in the application. In this 
availability of the real machine. case user-specific assistance is required. 


Example: Interactive exoloration of the 
virtual learning environmentincluding the 
functionality and componentsof the 
technical device. 
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4 Example: Maintenance Training for a High Voltage Circuit 
Breaker 


The following section describes a learning application that was developed for the 
maintenance of a high voltage circuit breaker [25]. Using the learning environment, 
technicians shall be prepared for acting safely and confidently in their future working 
environment. In this regard being able to internalize how to handle dangerous and 
complex processes is essential. This knowledge forms the basis for safe handling in 
the real working environment. [7] 

Consequently the first learning module deals with the exploration of the high vol- 
tage circuit breaker and its two main components: three pole columns and the operat- 
ing mechanism. 

So far 2D-drawings (e.g. exploded view drawings) were widely used within tech- 
nical trainings and are well known from user manuals. Assigning the assembly parts 
and their denomination using these drawings is a behavioristic learning strategy that 
might be suitable for easy assembly structures. In case of the operating mechanism 
that contains many single parts, the use of a 2D-drawing (Fig. 3a) is limited because it 
shows only one predefined perspective and exactly one state of the device. For the 
transfer to other states of the operating mechanism a higher capability of abstraction is 
required. 

The virtual learning environment introduces a constructivist approach that connects 
the well-known 2D drawing with an interactive model (Fig. 3b) of the operating me- 
chanism. Users can explore the device individually according to their demands. Used 
in classroom training, the mechanical behavior is no longer only presented in a teach- 
er-centered approach. Users can now explore the components and the functionality by 
interactively using the virtual model on a laptop or an immersive VR system. It can be 
summarized that factual knowledge can be designed in a behavioristic manner in case 
of comparatively easy models (e.g. easy assignment tasks). For more complex sys- 
tems it is recommended to extend the approach to a constructive one. 


Gasisaniried 


A 
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Fig. 3. (a) 2D drawing and (b) virtual model of the operating mechanism 
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In the second learning module technicians can make themselves familiar with best- 
practice solutions of chosen maintenance tasks. The visualized tasks were chosen 
because of their relevance and their complexity. Together with the technical experts 
the working processes were discussed, whereas it could be recognized that the discus- 
sion that is accompanied by a visual tool is much more intensive than just talking 
about a process. Before using VR based environments the work process was ex- 
plained by using written manuals, checklists and videos. The use of a video is a main- 
ly cognitivistic approach where the learner observes another person handling a situa- 
tion and transfers the knowledge gained to their own task. In many situations videos 
are well suited, e.g. to have it available on mobile devices and to remember single 
working steps. 

For more complex tasks a feedback from the system and the opportunity of inte- 
racting and getting further information is necessary. In the virtual learning environ- 
ment a work step is described by a set of predefined animations and actions that were 
developed together with the technical experts, enhanced by additional information 
that can be accessed from the virtual scene or from a checklist. This design is follow- 
ing a constructive approach. 


Fig. 4. Best practice solution combining animations, a checklist and different media 


The learning modules are suitable for learning groups of different levels of expertise. 
For the learning application presented, the taxonomy of Bloom can be interpreted as 
follows: 


1. Knowledge: The learner can assign the parts and assembly groups of the pole col- 
umn and the operating mechanism of one type of high voltage circuit breaker, 
knows how SF, — the isolating gas within the pole column — behaves in case of 
compression and in which order working steps have to be executed. 

2. Comprehension: The learner explains the functional processes of the operating me- 
chanism, the pole column and the coupling between as well as the changes of gas 
and electricity during runtime. 

3. Application: The learner can apply the knowledge to other types of high voltage 
circuit breakers. 
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4. Analysis: The learner can divide a real task into subtasks and can use his/her 
knowledge for problem-solving. Factual knowledge and process knowledge are 
used for understanding, analyzing and solving the problem. 

5. Synthesis: Learners can solve problems that were not part of their qualification. 
Because of their knowledge and experience they can recognize relations and de- 
velop new solutions. 

6. Evaluation: The learner has a far-reaching overview from technical as well as from 
the economical point of view. This gives him the ability to distinguish between dif- 
ferent solutions following the companies’ interests. The learner is also able to 
transfer his/her knowledge to colleagues in a suitable manner. 


5 Experiences From Practical Use 


“In order to evaluate the VR learning application a quasi-experimental pretest- 
posttest-follow-up control group design was chosen, with a group of trainees being 
taught traditionally by a trainer as the control group (TT) and a second equally sized 
group of trainees using the virtual reality application as the experimental group (VR). 
Hence, the trainees were assigned randomly to two groups of 10 persons each. In each 
training session participated 5 trainees and each one took 8 hours of work. This ap- 
plies for both conditions, the TT training and VR training as well.” [10] The evalua- 
tion has revealed a very high acceptance of the learning environment among users. 
The acceptance was rated even higher from the more experienced workers as they 
could imagine situations in which the use of the virtual environment would have had 
improved their understanding of processes and therefore their performance in the job. 

The learning environment was easy-to-use for the whole peer group although it 
was recognized that the learning environment should be defined with respect to the 
user group. The design as well as the presentation of the content needs to be adapted 
for older users, e.g. by referring to experiences they made in their work life before. 


6 Summary and Outlook 


The paper at hand has presented the necessity of considering basic approaches of the 
didactical design when developing VR based learning environments. Based on the 
activity theory of Engestrém the outcome of the learning system can be influenced by 
considering the learner, the learning object as well as media that is used for learning. 
Due to the technical domain that is focused in this paper, especially the field of main- 
tenance, Virtual Reality reveals a high potential for designing learning applications as 
it allows the very clear and understandable visualization of complex technical 
processes. Learning in the real working environment is often limited because the 
equipment is not available or the handling is very dangerous. Learning in VR there- 
fore contains no risk. Beside this the use of Virtual Reality allows learners to interac- 
tively solve learning tasks under the aspect of situated learning and the constructive 
approach of learning. 
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Future work will focus on the visualization of experiential knowledge. As experien- 


tial knowledge is mainly tacit knowledge, narrative methods for its investigation, as 
described in the paper at hand, need to be developed and applied. They can be im- 
proved by using VR-based applications that allow the documentation of stories already 
within the interview process and can keep the narrative structure. This ensures a better 
transfer process. Furthermore the access and the presentation of knowledge will be 
designed adaptively by means of the user characteristics (age, pre-existing knowledge). 
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Abstract. After having finished studies, graduates need to apply their know- 
ledge to a new environment. In order to professionally prepare students for new 
situations, virtual reality (VR) simulators can be utilized. During our research, 
such a simulator is applied in order to enable the visit of remote laboratories, 
which are designed through advanced computer graphics in order to create si- 
mulated representations of real world environments. That way, it is our aim to 
facilitate the access to practical engineering laboratories. 

Our goal is to enable a secure visit of elusive or dangerous places for students 
of technical studies. The first step towards the virtualization of engineering envi- 
ronments, e.g. a nuclear power plant, consists in the development of demonstra- 
tors. In the present paper, we describe the elaboration of an industry relevant 
demonstrator for the advanced teaching of engineering students. Within our ap- 
proach, we use a virtual reality simulator that is called the “Virtual Theatre”. 


Keywords: Virtual Reality, Virtual Theatre, Remote Laboratories, Immersion. 


1 Introduction 


In terms of modern teaching methods within engineering classes, various different 
approaches can be utilized to impart knowledge to students. There are traditional 
teaching techniques, which are still suitable for most of the knowledge transfer. These 
methods are carried out by the use of written texts or the spoken word. However, due 
to the increasing number of study paths as well as the specialization of particularly 
technical oriented classes, there is a need for the integration of new media into the 
curriculum of most students [1]. Thus, the visualization of educational content in 
order to explain theory more concrete and tangible has gained in importance. Not 
least because of the progress in computer science and graphical visualization, the 
capabilities of visualizing objects of interest within an artificially designed context 
have grown to an exhaustive amount. However, not only the visualization techniques 
have emerged, the way of distributing knowledge through teaching media has also 
grown. One major improvement in reaching students independently to their location 
are E-Learning Platforms [2]. These technical possibilities of sharing and representing 
contents open up new opportunities in teaching and learning for students. 
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Thus, in nearly all courses of studies, new media have gained a high significance in 
the past decade. These new media are continuously replacing conventional media or 
in other words traditional, static teaching approaches using books and lecture notes. 
The new media are mostly based on methods of digital visualization [3], e.g. presenta- 
tion applications like PowerPoint [4]. This switch from the traditional lecture speech 
to graphical representations have been performed, because this form of presentation 
enables focusing on the main points of educational content using illustrative represen- 
tations and pictorial summaries [5]. Despite the positive [6], but also critical discus- 
sion about an overwhelming usage of PowerPoint [7—9] as primary teaching tool [10], 
the usage of presentation software in the classroom has grown constantly [11]. 

Applications like PowerPoint may be a far reaching advancement for most courses 
within university. However, even these IT-based teaching supports are limited to a 
certain kind of knowledge transfer. Especially practically oriented study paths like 
engineering courses have an urgent need for interaction possibilities. In these highly 
technical focused studies, the teaching personnel are facing more and more obstacles 
in imparting their knowledge tangible. Due to the advanced and complex technology 
level of the relevant applications [12], progressive methods have to be applied to ful- 
fill the desired teaching goals. In order to make the problem based learning methodol- 
ogies available [13], novel visualization techniques have to be carried out. 

Studies of astronautics or nuclear research can serve as an incisive example for the 
need of innovative visualization capabilities. During astronomy studies, the teaching 
personnel will face insurmountable obstacles, if they want to impart practical know- 
ledge about aerospace travelling to the students using theoretical approaches. In order 
to gain deep, experienced knowledge about real situations an astronaut has to face, 
realistic scenarios have to be carried out. This can for instance be performed by set- 
ting up expensive real-world demonstrators that facilitate practical experiences within 
aerospace travelling events, e.g. by making use of actual acceleration. 

However, there is also a need for a visual representation of the situation. In order to 
fulfill the requirements of a holistic experience, these visualization techniques need to 
perform an immersive representation of the virtual world scenario. In this connection, 
the term immersion is defined according to Murray [14] as follow: “Immersion is a 
metaphorical term derived from the physical experience of being submerged in water. 
We seek the same feeling from a psychologically immersive experience that we do 
from a plunge in the ocean or swimming pool: the sensation of being surrounded by a 
completely other reality, as different as water is from air that takes over all of our 
attention, our whole perceptual apparatus.” 

It is obvious that experience can only be impressive enough to impart experienced 
knowledge, if the simulation of a virtual situation has an immersive effect on the per- 
ception of the user. Our latest research on creating virtual world scenarios has shown 
that immersion has got a high impact on the learning behavior of students [15]. Fol- 
lowing the idea of facilitating the study circumstances for students of astronautics, our 
first demonstrator was carried out in terms of a Mars scenario [16]. Using novel vi- 
sualization techniques in connection with realistic physics engines, we have carried 
out a realistic representation of a plateau located on the red planet. 
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In our next research phase, we want to go further to increase the interaction capa- 
bilities with the virtual environment the user is experiencing. In terms of the Mars 
representation, there were already few interaction possibilities like triggering of object 
movements or the navigation of vehicles [16]. However, this sort of interaction is 
based on rather artificial commands than on natural movements with realistic conse- 
quences in the representation of the virtual world scenario. 

Hence, in the present paper, we want to introduce a more grounded scenario, which 
is based on the aforementioned idea of enabling the visit of elusive or dangerous plac- 
es like an atomic plant. Accordingly, our first step in realizing an overall scenario of a 
detailed environment like a power plant consists in the development of single labora- 
tory environments. In this context, our aim is to focus especially on the interaction 
capabilities within this demonstrator. 

This target is pursued by carrying out a virtual prototype of an actual laboratory 
environment, which can be accessed virtually and in real-time by a user in a virtual 
reality simulator. The realization of these demonstrators is also known as the creation 
of “remote laboratories”. In the present paper, we describe the development, optimi- 
zation and testing of such a remote laboratory. After a brief introduction into the state- 
of-the-art of this comparatively new research field in chapter 2, our special Virtual 
Reality simulator, which is used to simulate virtual environments in an immersive 
way, is described in chapter 3. In chapter 4, the technical design of the remote labora- 
tory including its information and communication infrastructure is presented. In the 
Conclusion and Outlook, the next steps in realizing the overall goal of a virtual repre- 
sentation of an engineering environment like an atomic plant are pointed out. 


2 State of the Art 


In the introduction, we concluded that innovative teaching methodologies have to be 
adopted to be capable of imparting experienced knowledge to students. Thus, virtual 
reality teaching and learning approaches will be examined in the following. 

Nowadays, an exhaustive number of applications can be found that make use of 
immersive elements within real-world scenarios. However, the immersive character of 
all these applications is based on two characteristics of the simulation: The first one is 
the quality of the three-dimensional representation; the second one is the user’s identi- 
fication with the avatar within the virtual world scenario. 

The modeling quality of the three-dimensional representation of a virtual scenario 
is very important in order to be surrounded by a virtual reality that is realistic or even 
immersive. However, a high-quality graphical representation of the simulation is not 
sufficient for an intensive experience. Thus, according to Wolf and Perron [17], the 
following conditions have to be fulfilled in order to enable an immersive user expe- 
rience within the scenario: “Three conditions create a sense of immersion in a virtual 
reality or 3-D computer game: The user’s expectation of the game or environment 
must match the environment’s conventions fairly closely. The user’s actions must 
have a non-trivial impact on the environment. The conventions of the world must be 
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consistent, even if they don’t match those of the ‘metaspace’. 
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The user’s identification with virtual scenario is rather independent from the mod- 
eling of the environment. It is also depending on the user’s empathy with the “avatar”. 
Generally, an avatar is supposed to represent the user in a game or a virtual scenario. 
However, to fulfill its purposes according to the user’s empathy, the avatar has to 
supply further characteristics. Accordingly, Bartle defines an avatar as follows: “An 
avatar is a player’s representative in a world. [...] It does as it's told, it reports what 
happens to it, and it acts as a general conduit for the player and the world to interact. 
It may or may not have some graphical representation, it may or may not have a 
name. It refers to itself as a separate entity and communicates with the player.” 

There are already many technical solutions that are primarily focused on the crea- 
tion of high-quality and complex three-dimensional environments, which are accurate 
to real-world scenarios in every detail. Flight Simulators, for example, provide ve- 
hicle tracking [18]. Thus, the flight virtual reality simulator is capable of tracking the 
locomotion of a flying vehicle within the virtual world, but does not take into account 
the head position of the user. Another VR simulator is the Omnimax Theater, which 
provides a large angle of view [19], but does not enable any tracking capabilities 
whatsoever. Head-tracked monitors were introduced by Codella et al. [20] and by 
Deering [21]. These special monitors provide an overall tracking system, but provide 
a rather limited angle of view [18]. The first attempt to create virtual reality in terms 
of a complete adjustment of the simulation to the user’s position and head movements 
was introduced with the Boom Mounted Display by McDowall et al. [22]. However, 
these displays provided only poor resolutions and thus were not capable of a detailed 
graphical representation of the virtual environment [23]. 

In order to enable an extensive representation of the aimed remote laboratories, we 
are looking for representative scenarios that fit to immersive requirements using both 
a detailed graphical modeling as well as a realistic experience within the simulation. 
In this context, one highly advanced visualization technology was realized through 
the development of the Cave in 1991. In this context, the recursive acronym CAVE 
stands for Cave Automatic Virtual Environment [18] and was first mentioned in 1992 
by Cruz-Neira [24]. Interestingly, the naming of the Cave is also inspired by Plato’s 
Republic [25]. In this book, he “discusses inferring reality (ideal forms) form shadows 
(projections) on the cave wall” [18] within “The Smile of the Cave”. 

By making use of complex projection techniques combined with various projectors 
as well as six projection walls arranged in form of a cube, the developers of the Cave 
have redefined the standards in visualizing virtual reality scenarios. The Cave enables 
visualization techniques, which provide multi-screen stereo vision while reducing the 
effect of common tracking and system latency errors. Hence, in terms of resolution, 
color and flicker-free stereo vision the founders of the Cave have created a new level 
of immersion and virtual reality. 

The Cave, which serves the ideal graphical representation of a virtual world, brings 
us further towards true Virtual Reality, which — according to Rheingold [26] — is de- 
scribed as an experience, in which a person is “surrounded by a three-dimensional 
computer-generated representation, and is able to move around in the virtual world 
and see it from different angles, to reach into it, grab it and reshape it.” This enables 
various educational, but also industrial and technical applications. Hence, in the past 
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the research already focused on the power of visualization in technical applications, 
e.g. for data visualizations purposes [27] or for the exploration and prototyping of 
complex systems like the visualization of air traffic simulation systems [28]. Further- 
more, the Cave has also been used within medical or for other applications, which 
require annotations and labeling of objects, e.g. in teaching scenarios [29]. 

The founders of the Cave choose an even more specific definition of virtual reality: 
“A virtual reality system is one which provides real-time viewer-centered head- 
tracking perspective with a large angle of view, interactive control, and binocular 
display.” [18] Cruz-Neira also mentions that — according to Bishop and Fuchs [30] - 
the competing term “virtual environment (VE)” has a “somewhat grander definition 
which also correctly encompasses touch, smell and sound.” Hence, in order to gain a 
holistic VR experience, more interaction within the virtual environment is needed. 

Though, it is our aim to turn Virtual Reality into a complete representation of a vir- 
tual environment by extending the needed interaction capabilities, which are, together 
with the according hardware, necessary to guarantee the immersion of the user into 
the virtual reality [31]. However, even the Cave has got restricted interaction capabili- 
ties as the user can only interact within the currently demonstrated perspectives. 
Furthermore, natural movement is very limited, as locomotion through the virtual 
environment is usually restricted to the currently shown spot of the scenario. Yet, 
natural movements including walking, running or even jumping through virtual reality 
are decisive for a highly immersive experience within the virtual environment. 

This gap of limited interaction has to be filled by advanced technical devices with- 
out losing high-quality graphical representations of the virtual environment. Hence, 
within this publication, we introduce the Virtual Theatre, which combines the visuali- 
zation and interaction technique mentioned before. The technical setup and the appli- 
cation of the Virtual Theatre in virtual scenarios are described in the next chapter. 


3 The Virtual Theatre — Enabling Virtual Reality in Action 


The Virtual Theatre was developed by the MSEAB Weibull Company [32] and was 
originally carried out for military training purposes. However, as discovered by Ewert 
et al. [33], the usage of the Virtual Theatre can also be enhanced to meet educational 
requirements for teaching purposes of engineering students. It consists of four basic 
elements: The centerpiece, which is referred to as the omnidirectional treadmill, 
represents the Virtual Theatre‘s unique characteristics. Besides this moving floor, the 
Virtual Theatre also consists of a Head Mounted Display, a tracking system and a 
cyber glove. The interaction of these various technical devices composes a virtual 
reality simulator that combines the advantages of all conventional attempts to create 
virtual reality in one setup. This setup will be described in the following. 

The Head Mounted Display (HMD) represents the visual perception part of the 
Virtual Theatre. This technical device consists of two screens that are located in a sort 
of helmet and enable stereo vision. These two screens — one for each eye of the user — 
enable a three-dimensional representation of the virtual environment in the perception 
of the user. HMDs were first mentioned in Fisher [34] and Teitel [35] as devices that 
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use motion in order to create VR. Hence, the characteristic of the HMD consists in the 
fact that it has a perpendicular aligned to the user and thus adjusts the representation 
of the virtual environment to him. Each display of the HMD provides a 70° stereos- 
copic field with an SXGA resolution in order to create a gapless graphical representa- 
tion of the virtualized scenario [33]. For our specific setup, we are using the Head 
Mounted Display from zSight [36]. An internal sound system in the HMD enables an 
acoustic accompaniment for the visualization to complete the immersive scenario. 

As already mentioned, the ground part of the Virtual Theatre is the omnidirectional 
treadmill. This omnidirectional floor represents the navigation component of the Vir- 
tual Theatre. The moving floor consists of rigid rollers with increasing circumferences 
and a common origo [33]. The rotation direction of the rollers is oriented to the mid- 
dle point of the floor, where a circular static area is located. The rollers are driven by 
a belt drive system, which is connected to all polygons of the treadmill through a 
system of coupled shafts and thus ensures the kinematic synchronization of all parts 
of the moving floor. The omnidirectional treadmill is depicted in figure 1. 


Fig. 1. Technical design of the Virtual Theatre’s omnidirectional treadmill 


On the central area that is shown in the upper right corner of figure 1, the user is 
able to stand without moving. As soon as he steps outside of this area, the rollers start 
moving and accelerate according to the distance of his position to the middle part. If 
the user returns to the middle area, the rotation of the rollers stops. 

The tracking system of the Virtual Theatre is equipped with ten infrared cameras 
that are evenly distributed around the treadmill in 3 m above the floor. By recording 
the position of designated infrared markers attached to the HMD and the hand of the 
user, the system is capable of tracking the user’s movements [33]. Due to the unsym- 
metrical arrangement of the infrared markers the tracking system is not only capable 
of calculating the position of the user, but is also capable of determining looking di- 
rections. That way, the three-dimensional representation of the virtual scenario can be 
adjusted according to the user’s current head position and orientation. Furthermore, 
the infrared tracking system is used in order to adjust the rotation speed of the rollers 
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no only according to the user’s distance from the middle point, but also according to 
the difference of these distances within a discrete time interval. Using these enhanced 
tracking techniques, the system can deal with situations, in which the user stands 
without moving while not being located in the middle of the omnidirectional floor. 

The cyber glove ensures the tactile interaction capabilities. This special hand glove 
is equipped with 22 sensors, as indicated above, which are capable of determining the 
user’s hand position and gestures [33]. This enables the triggering of gesture based 
events like the grasping of objects. Additionally, special programmable gestures can 
be utilized in order to implement specific interaction commands. 

After setting up the required hardware of the Virtual Theatre, the user can plunge 
into different scenarios and can be immersed by virtual reality. After the development 
of learning and interaction scenarios as described in [16], our main interest here is 
focused on the development of remote laboratories, which represent the first step 
towards the realization of a virtual factory. The development, testing and evaluation 
of our first “Remote Lab” are described in the next chapter. 


4 Development of Remote Laboratories in the Virtual Theatre 


The described setup of the Virtual Theatre can be used to immerse the user into a 
virtual reality scenario not only for demonstration purposes, but especially for the 
application of scenarios, in which a distinctive interaction between the user and the 
simulation is required. One of these applications consists in the realization of remote 
laboratories, which represent the first step towards the creation of real-world demon- 
strators like a factory or an atomic plant into virtual reality. 


Fig. 2. Two cooperating ABB IRB 120 six-axis robots 


The virtual remote laboratory described in this paper consists in a virtual represen- 
tation of two cooperating robot arms that are setup within our laboratory environment 
(see figure 2). These robots are located on a table in such a way that they can perform 
tasks by executing collaborative actions. For our information and communication 
infrastructure setup, it doesn’t matter, if the robots are located in the same laboratory 
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as our Virtual Theatre or in a distant respectively remote laboratory. In this context, 
our aim was to virtualize a virtual representation of the actual robot movements in the 
first step. In a second step, we want to control and to navigate the robots. 

In order to visualize the movements of the robot arms in virtual reality, first, we 
had to design the three-dimensional models of the robots. The robot arms, which are 
installed within our laboratory setup are ABB IRB 120 six-axis robotic arms [37]. For 
the modeling purposes of the robots, we are using the 3-D optimization and rendering 
software Blender [38]. After modeling the single sections of the robot, which are con- 
nected by the joints of the six rotation axes, the full robot arm model had to be 
merged together using a bone structure. Using PhysX engine, the resulting mesh is 
capable of moving its joints in connection with the according bones in the same fa- 
shion as a real robot arm. This realistic modeling principally enables movements of 
the six-axis robot model in virtual reality according to the movements of the real ro- 
bot. The virtual environment that contains the embedded robot arms is designed using 
the WorldViz Vizard Framework [39], a toolkit for setting up virtual reality scenarios. 

After the creation of the virtual representation of the robots, an information and 
communication infrastructure had to be set up in order to enable the exchange of in- 
formation between the real laboratory and the simulation. The concept of the inter- 
communication as well as its practical realization is depicted in figure 3. 


User User 
ABB Graphical 
IRB Representation 
120 
ROS Worldviz 


Vizard 


Protobuf 


> 


Fig. 3. Information and Communication Infrastructure of the remote laboratory setup 


As shown in the figure, the hardware of the remote laboratory setup is connected 
through an internal network. On the left side of the figure, a user is demonstrated, 
who operates the movements of the real robot arms manually through a control inter- 
face of the ABB IRB 120 robots. This data is processed by a computer using Linux 
with embedded Robot Operating System (ROS). The interconnection between the real 
laboratory and the virtual remote laboratory demonstrator is realized using the Proto- 
col Buffers (Protobuf) serialization method for structured data. This interface descrip- 
tion language, which was developed by Google [40], is capable of exchanging data 
between different applications in a structured form. 
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After the robots’ position data is sent through the network interface, the informa- 
tion is interpreted by the WorldViz Vizard engine to visualize the movements of the 
actual robots in virtual reality. After first test phases and a technical optimization of 
the network configuration, the offset time between the robot arm motion in reality and 
in virtual reality could be reduced to 0.2 seconds. Due to the communication design of 
the network infrastructure in terms of internet-based communication methods, this 
value would not increase significantly, if the remote laboratory would be located in a 
distant place, for example in another city or on the other side of the globe. 

The second user, which is depicted in the right upper part of figure 3 and who is 
located in the Virtual Theatre, is immersed by the virtual reality scenario and can 
observe the positions and motions of the real robots in the virtual environment. In 
figure 4, the full setup of the real and the remote laboratory is illustrated. 


Fig. 4. Manual control of the robots and visual representation in the Virtual Theatre 


In the foreground of the figure, two users are controlling the movements of the ac- 
tual robots in the real laboratory using manual control panels. In the background on 
the right side of the picture, the virtual representation of the two ABB IRB 120 robot 
arms is depicted. The picture on the right side of the wall is generated using two digi- 
tal projectors, which are capable of creating a 3-D realistic picture by overlapping the 
pictures of both projections. The picture depicted on top of the robot arms table is a 
representation of the picture the user in the VR simulator is actually seeing during the 
simulation. It was artificially inserted into figure 4 for demonstration purposes. 

This virtual remote laboratory demonstrator shows impressively that it is already 
possible to create an interconnection between the real world and virtual reality. 


5 Evaluation 


The results of first evaluations within the test mode of our virtual remote laboratory 
demonstrator have shown that the immersive character of the virtual reality simulation 
has got a major impact on the learning behavior and especially on the motivation of 
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the users. Within our test design, students were first encouraged to implemented spe- 
cific movements of an ABB IRB 120 robot using the Python programming language. 
After this practical phase the students were divided into two groups. 

The first group had the chance to watch a demonstration of the six axis robots car- 
rying out a task using “LEGO” bricks. After seeing the actual movements of the ro- 
bots within our laboratories, the students were fairly motivated to understand the way 
of automating the intelligent behavior of the two collaborating robots. 

The second group of students had the possibility to take part in a remote laboratory 
experiment within the Virtual Theatre. After experiencing the robot movements in the 
simulated virtual environment performing the same task as the real world demonstra- 
tor, the students could observe the laboratory experiment they were just experiencing 
in the Virtual Theatre recorded on video. Their reaction on the video has shown that 
the immersion was more impressive than the observation of the actual robot’s move- 
ments performed by the other group. Accordingly, the students of the second compar- 
ison group were even more motivated after their walk through the virtual laboratory. 
The students of the second group were actually aiming at staying in the laboratory 
until they finished automating the same robot tasks they just saw in virtual reality. 


6 Conclusion and Outlook 


In this paper, we have described the development of a virtual reality demonstrator for 
the visualization of remote laboratories. Through the demonstrated visualization tech- 
niques in the Virtual Theatre, we have shown that it is possible to impart experienced 
knowledge to any student independent of his current location. This enables new pos- 
sibilities of experience-based and problem-based learning. As one major goal of our 
research project “ELLI — Exzellentes Lehren und Lernen in den Ingenieurwissen- 
schaften (Excellent Teaching and Learning within engineering science)”, which ad- 
dresses this type of problem-based learning [13], the implemented demonstrator con- 
tributes to our aim of establishing advanced teaching methodologies. The visualiza- 
tion of real-world systems in virtual reality enables the training of problem-solving 
strategies within a virtual environment as well as on real objects at the same time. 

The next steps of our research consist in advancing the existing demonstrator in 
terms of a bidirectional communication between the Virtual Theatre demonstrator and 
the remote laboratory. Through this bidirectional communication we want to enable a 
direct control of the real laboratory from the remote virtual reality demonstrator. First 
results in the testing phase of this bidirectional communication show that such a re- 
mote control will be realized in the near future. In order to enable a secure remote 
control of the remote laboratory, collision avoidance and other security systems for 
cooperating robots will be carried out and tested in the laboratory environment. 

As the overall goal of our project consists in the development of virtual factories in 
order to enable the visit of an atomic plant or other elusive places, our research efforts 
will finally focus on the development of a detailed demonstrator for the realistic re- 
presentation of an industrial environment. 
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Abstract. Learning activities are not necessary to be only in traditional physical 
classrooms but can also be set up in virtual environment. Therefore the authors 
propose a novel augmented reality system to organize a class supporting 
real-time collaboration and active interaction between educators and learners. 
A pre-processing phase is integrated into a visual search engine, the heart of 
our system, to recognize printed materials with low computational cost and high 
accuracy. The authors also propose a simple yet efficient visual saliency estima- 
tion technique based on regional contrast is developed to quickly filter out 
low informative regions in printed materials. This technique not only reduces 
unnecessary computational cost of keypoint descriptors but also increases 
robustness and accuracy of visual object recognition. Our experimental results 
show that the whole visual object recognition process can be speed up 19 times 
and the accuracy can increase up to 22%. Furthermore, this pre-processing stage 
is independent of the choice of features and matching model in a general 
process. Therefore it can be used to boost the performance of existing systems 
into real-time manner. 


Keywords: Smart Education, Active Learning, Visual Search, Saliency Image, 
Human-Computer Interaction. 


1 Introduction 


Skills for the 21st century require active learning which focuses on the responsibility 
of learning on learners [1] by stimulating the enthusiasm and involvement of learners 
in various activities. As learning activities are no longer limited in traditional physical 
classrooms but can be realized in virtual environment [2], we propose a new system 
with interaction via Augmented Reality (AR) to enhance the attractiveness and colla- 
boration for learners and educators in virtual environment. To develop a novel AR 
system for education, we focus on the following two criteria as the main guidelines to 
design our proposed system, including real-time collaboration and interaction, and 
naturalness of user experience. 
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The first property emphasizes real-time collaboration and active interaction be- 
tween educators and learners via augmented multimedia and social media. Just look- 
ing through a mobile device or AR glasses, an educator can monitor the progress of 
learners or groups via their interactions with augmented content in lectures. The edu- 
cator also gets feedbacks from learners on the content and activities designed and 
linked to a specific page in a lecture note or a textbook to improve the quality of lec- 
ture design. Learners can create comments, feedback, or other types of social media 
targeting a section of a lecture note or a page of a textbook for other learners or the 
educator. A learner can also be notified and know social content created by other team 
members during the progress of teamwork. 

The second property of the system is the naturalness of user experience as the sys- 
tem can aware of the context, i.e. which section of a page in a lecture note or a text- 
book is being read, by natural images, not artificial markers. Users can also interact 
with related augmented content with their bare hands. This helps users enhance their 
experience on both analog aesthetic emotions and immersive digital multisensory 
feedback by additional multimedia information. 

The core component to develop an AR education environment is to recognize cer- 
tain areas of printed materials, such as books or lecture handouts. As a learner is easi- 
ly attracted by figures or charts in books and lecture notes, we encourage educators to 
exploit learners’ visual sensitivity to graphical areas and embed augmented content to 
such areas, not text regions, in printed materials to attract learners. Therefore in our 
proposed system, we do not use optical character recognition but visual content rec- 
ognition to determine the context of readers in reading printed materials. 

In practice, graphical regions of interest that mostly attract readers in a page do not 
fully cover a whole page. There are other regions that do not provide much useful 
information for visual recognition, such as small decorations or small texts. There- 
fore, we propose the novel method based on saliency metric to quickly eliminate un- 
important or noisy regions in printed lecture notes or textbooks and speed up the visu- 
al context recognition process on mobile devices or AR glasses. Our experimental 
results show that the whole visual object recognition process can be speed up 19 times 
and the accuracy can increase up to 22%. 

This paper is structured as follows. In Section 2, the authors briefly present and 
analyze the related work. The proposed system is presented in Section 3. In Section 4, 
we present the core component of our system — the visual search engine. The experi- 
ments and evaluations are showed in Section 5. Then we discuss potential use of the 
system in Section 6. Finally, Section 7 presents conclusion and ideas for future work. 


2 Related Work 


2.1 Smart Educational Environment 


Active learning methods focus on the responsibility of learning on learners [1]. To 
create an environment for leaners to study efficiently with active learning methods, 
educators should prepare and design various activities to attract learners. The educa- 
tors also keep track of the progress for each member in team-work, and simulate the 
enthusiasm and collaboration of all learners in projects. 
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Learning activities are not necessarily to be in traditional physical classrooms but 
virtual environment as well [3]. An educator is required to use various techniques to 
attract learners’ interest and attention to deliver knowledge impressively to them. 
Augmented Reality (AR) is an emerging technology that enables learners to explore 
the world of knowledge through the manipulation of virtual objects in real world. 

AR has been applied in education to attract learners to study new concepts easily. 
With handheld displays, users can see virtual objects appearing on the pages of Ma- 
gicBook [4] from their own viewpoint. After the work was published, several imple- 
mentations of AR books were created for education, storytelling, simulation, game, 
and artwork purposes such as AR Vulcano Kiosk and $.0.L.A.R system [5]. 

AR has also shown great potential in developing and creating an interactive and a 
more interesting learning environment for the learners. Therefore, useful methods 
such as interactive study, collaboration study are proposed to enhance this. The class- 
room environment can be implemented in many ways: collaborative augmented multi 
user interaction [2] and mixed reality learning spaces [3]. 

However, these systems still have some limitations. First of all, they do not expli- 
citly describe mechanism and processes for educators and learners to interact and 
collaborate efficiently in virtual environment with AR. Second, the educators may not 
have the feedbacks from learners to redesign or organize augmented data and content 
that are linked to sections in a printed material to improve the quality of education 
activities. Third, although AR system permits different users to get augmented infor- 
mation corresponding to different external contexts, all users receive the same content 
when looking at the same page of the book. And the last limitation is that these sys- 
tems usually give unnatural feeling due to using artificial markers. 

The mentioned problems motivate us to propose our smart education environment 
with AR and personalized interaction to enhance the attractiveness and immersive 
experience for educators and learners in virtual environment to improve efficiency in 
teaching and learning. In our proposed system, educators can receive explicit and 
implicit feedbacks from learners on the content and activities that are designed and 
linked to a specific lecture in a printed material. 


2.2 Visual Sensitivity of Human Perception 


A conventional approach to evaluate the attraction of objects in an image is based on 
textural information. In this direction, regional structural analysis algorithms based 
on gradient are used to detect features. However, saliency is considered better to 
reflect sensitivity of human vision to certain areas on an image thus benefits context 
awareness systems [6]. Visual saliency [7], human perceptual quality indicating the 
prominence of an object, person, or pixel to its neighbors thus capture our attention, is 
investigated by multiple disciplines including cognitive psychology, neurobiology, 
and computer vision. Salient maps are topographical maps of the visually salient parts 
of scenes without prior knowledge of their contents and thus remains an important 
step in many computer vision tasks. 

Saliency measures are factors attracting eye movements and attention such as col- 
or, brightness, and sharpness, etc. [8]. Self-saliency is a feature that expresses the 
inner region complexity, which includes color saturation, brightness, texture, edgi- 
ness, etc. Whereas, relative saliency indicates differences betleen a region and its 
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surrounding regions such as color contrast, sharpness, location, etc. Saliency meas- 
ures can be combined with different weights to determine important regions more 
efficiently. 

Most of saliency object detection techniques can be characterized as bottom-up sa- 
liency analysis, which is data-driven [9], or top-down approach, which is task-driven 
[6]. We focus on pre-attentive bottom-up saliency detection techniques. These me- 
thods are extensions of expert-driven human saliency that tends to use cognitive psy- 
chological knowledge of the human visual system and to find image patches on edges 
and junctions as salient using local contrast or global unique frequencies. Local con- 
trast methods are based on investigating rarity of an image region with respect to local 
neighborhoods [8]. Whereas, global contrast based methods evaluate saliency of an 
image region using its contrast with respect to the entire image [10]. 

In this paper, the authors propose an efficient based human vision computation me- 
thod to detect automatically high informative regions based on regional contrast in 
order to determine which region contains meaningful keypoint candidates. This re- 
duces redundant candidates for further processing steps. 


3 Overview of Proposed System 


3.1. Motivations: Advantages of Smart Environment 


The main objective of our system is to create a smart interactive education environ- 
ment to support real-time collaboration and active interaction between educators and 
learners. Via special prisms, i.e. mobile devices or AR glasses, both educators and 
learners are linked to the virtual learning environment with real-time communication 
and interactions. Our proposed system has the following main characteristics: 


1. Interactivity: Learners and educators can interact with augmented content, includ- 
ing multimedia and social media, or interact with others via augmented activities, 
such as exercises or discussion. 

2. Personalization: Augmented content and activities can be adapted to each learner 
to provide the learner with the most appropriate, individualized learning paradigm. 
The adaptation can be in active or passive modes. In active mode, each learner can 
customize which types of augmented content and activities that he or she wants to 
explore or participate. In passive mode, an educator can individualize teaching ma- 
terials to meet the progress, knowledge level, personal skills and attitudes of each 
learner. 

3. Feedback: Interactive feedbacks from learners can be used to help an educator re- 
design existing teaching materials or design future teaching activities. Besides, 
feedbacks of a learner can also be used to analyze his or her personal interests, 
knowledge level, personal skills and attitudes toward certain types of activities in 
learning. 

4. Tracking: The progress of a learner or a group of learners can be monitored so that 
an educator can keep track of the performance of each individual or a group. 
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3.2. Typical Scenarios of Usage for an Educator 
The proposed system provides an educator with the following main functions: 


1. Design augmented content and activities for lectures 

2. Personalize or customize augmented content and activities for each learner or a 
group of learners 

3. Monitor feedbacks and progress of each learner or a group of learners 


The first step to create an AR-supported lecture is to design augmented content and 
activities for lectures. Lecture documents in each course include textbooks, reference 
books, and other printed materials. An educator can now freely design lectures with 
attached augmented materials (including multimedia, social media, or activities) that 
can be revised and updated over terms/semesters, specialized for different classes in 
different programs such as regular or honors program, and adapted to different lan- 
guages. Because of wide variety of attached augmented media and activities, an edu- 
cator can customize a curriculum and teaching strategies to deliver a lecture. 

An educator uses our system to design augmented content (including multimedia 
objects or activities) for a lecture and assigns such content to link with a specific re- 
gion in a printed page of a lecture note/textbook (c.f. Figure 1). Augmented media are 
not only traditional multimedia contents, such as images, 3D models, videos, and 
audios, but also social media contents or activities, such as different types of exercis- 
es, an URL to a reference document, or a discussion thread in an online forum, etc. 
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Fig. 1. Design augmented content for printed lecture notes and textbooks 


For a specific page, an educator first selects a graphical region that can visually 
attract learners’ attention, and links it to augmented contents, either resources or 
activities. The system automatically learns features to recognize the selected graphical 
region together with embedded resources to a remote server. An educator can also 
design different sets of augmented contents for the same printed teaching materials 
for different groups of learners in the same class, or for classes in classes in different 
programs, to utilize various teaching strategies and learning paradigms. 

After designing AR-supported teaching materials, an educator can interact with 
learners via augmented activities during a course. Useful information on learners’ 
activities and interactions are delivered to an educator so that the educator can keep 
track of the progress of a learner or a group of learners, update and customize aug- 
mented resources or activities to meet learners’ expectation or level of knowledge, 
and redesign the teaching materials for future classes. 
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3.3. Typical Scenarios of Usage for a Learner 


A learner can use a mobile device or smart glasses to see pages in a textbook, a refer- 
ence book, or a printed lecture handout. Upon receiving the visual information that a 
learner is looking at, the server finds the best match (c.f. Section 4). Then the system 
transforms the reality in front of the learner’s eyes into an augmented world with 
linked media or activities. Dynamic augmented contents that match a learner’s per- 
sonal profile and preferences are downloaded from the server and displayed on the 
learner’s mobile device screen or glasses(c.f. Figure 2).. 


Content 
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Fig. 2. Learners use the proposed system 
Learners can interact with these virtual objects with their bare hands. Skin detec- 
tion algorithm is used to enable learners use their bare hands to interact with virtual 


objects appearing in front of their eyes. An event corresponding to a virtual object is 
generated if that object is occluded by a human skin color object long enough. 
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Fig. 3. Interaction and feedback 


Learners can add a new virtual note or comment to a specific part of a printed lecture 
and share with others. They can also do exercises embedded virtually as an augmented 
content linked to printed lecture notes. When learners use this system, their behaviors 
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are captured as implicit feedbacks to the educator (c.f. Figure 3). An educator can now 
analyze learners’ behaviors and intention to adjust teaching materials to well adapt to 
each learner of a group. With collaborative filtering methods, the system can recom- 
mend to educators which types of augmented content are appropriate for a specific 
learner based on learners’ profiles. 


4 Visual Search Optimization with Saliency Based Metric 


4.1 Overview 


For mobile visual search (MVS) applications, most of existing methods use all key- 
points detected from a given image, including those in unimportant regions such as 
small decoration or text areas. Different from state-of-the-art methods, our approach 
reduces the number of local features instead of reducing the size of each descriptor. 
Only keypoints with meaningful information are considered. As our method is inde- 
pendent of the choice of features, the combination of our idea with compact visual 
descriptors will give more efficiency. 


Onginal Image Saliency Map Main Object Local Features Image Database 
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Pre-Processing Phase 


Fig. 4. Our approach to detect a page in a printed lecture note or textbook 


We propose the idea to utilize the saliency map of an image to quickly discard 
keypoints in unimportant or insensitive regions of a template image as well as a query 
image (c.f. Figure 4). The visual sensitivity of each region is evaluated to determine 
keypoints to be preserved and those to be removed. This helps to reduce computation- 
al cost in local feature extraction of an image. As keypoints in unimportant regions 
can be removed, the accuracy of visual object recognition can also be improved. 

Figure 5 shows our proposed method with two main steps. First, an arbitrary image 
is decomposed into perceptually homogeneous elements. Then, saliency maps are 
derived based on the contrast of those elements. The proposed saliency detection al- 
gorithm is inspired by the works in object segmentation with image saliency [10]. In 
our approach, regions of interest can be discrete and there is no need of merging. 
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Fig. 5. Pre-processing phase 


4.2 Image Abstraction 


To simplify illustrations from color images, visual contents are abstracted by region 
based segmentation algorithms. A region grows by adding similar neighboring pixels 
according to certain homogeneity criteria, increasing size of region gradually. The 
proposed algorithm for this phase includes two steps: Over-Segmentation (c.f. Figure 
5.B) and Region Growing (c.f. Figure 5.C). 

Over-Segmentation: An image is over-segmentation by the watershed-like me- 
thod. The regions are merged on the basis of a similarity color criterion afterwards: 


[Ici - cjl| < 6 where c; and c; are pixels in the same region. 
2 


Region Growing: Neighboring segments are merged based on their sizes, which 
are the number of pixels of each region. If a region whose size is below a threshold, it 
is merged to its nearest region, in terms of average Lab color distance. To speed up, 
we use Prim’s algorithm [11] to optimize merging regions. 


4.3. Visual Saliency Estimation 


An image captured from a camera is intentionally focused on meaningful regions by 
human vision which reacts to regions with features such as unique colors, high con- 
trast, or different orientation. Therefore, to estimate the attractiveness, the contrast 
metric is usually used to evaluate sensitivity of elements in image. 

A region with high level of contrast with surrounding regions can attract human at- 
tention and is perceptually more important. Instead of evaluating the contrast differ- 
ence between regions in an original image, the authors only calculate the contrast 
metric based on Lab color between regions in the corresponding segmented image. As 
the number of regions in the original image is much more than the number of regions 
in its corresponding segmented image, our approach not only simplifies the calcula- 
tion cost but also exploits the meaningful regions in the captured image efficiently. 
The contrast C; of a region R; is calculated as the difference between Lab color of 
R,; and its surrounding regions: 


be j=1 0(R;) lle - alll, (1) 
Di-1 @(R;) 
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where c; and c; are Lab colors of regions ‘Rj; and R; respectively, and w(R;) is 
the number of pixels in region R;. Regions with more pixels contribute higher local- 
contrast weights than those containing only a few pixels. Finally, C; is normalized to 
the range [0,1]. Figure 6 shows that our method can provide better results than exist- 
ing saliency calculation techniques. 


Image Ground truth Ours BMS [12] FT [7] GC [13] HC [14] LC [15] SR [16] 


Fig. 6. Visual comparison between the proposed method and other state-of-the-art methods 


5 Experiments and Evaluation 


5.1. Page Detection Evaluation 


We conduct the experiment to evaluate the efficiency of our proposed method by 
matching local features extracted from images in the dataset to compare the accuracy 
and performance of the proposed process with the original method which does not 
filter out keypoints and the other state-of-the-art saliency detection methods. Since the 
proposed process is independent of the keypoint extraction and recognition algo- 
rithms, experiments are conducted to evaluate our approach using four popular local 
features: BRIEF [17], BRISK [18], SIFT [19], and SURF [20]. 

Experiment is conducted in a system using CPU Core i3 3.3 GHz (with 4GB 
RAM). Our dataset consists of 200 pages (with resolution 566 x 750) of reference 
materials for students in Computer Science, including MSDN Magazine, ACM 
Transaction Magazine, and IEEE Transaction Magazine. Each typical page includes 
three types of regions: background, text region, and image. 

All local features are extracted in two scenarios: extracting all keypoints and ex- 
tracting only keypoints in important regions. Image matching is then performed with 
each pair of images. The accuracy of matching is computed as proportion of correctly 
matched pairs of images over the number of image pairs. The result of this experiment 
is shown in Figure 7(a). 

On average, the proposed method outperforms conventional methods up to 7%. Es- 
pecially, when using SIFT feature, the accuracy is boosted approximately 22%. More- 
over, our saliency detection module is replaced by different existing state-of-the-art 
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methods such as BMS [12], FT [7], GC [13], HC [14], LC [15], and SR [16] to eva- 
luate efficiency of our approach. In most cases, our process can provide better results 
than others. Incorporating our pre-process stage can not only preserve the robustness of 
conventional methods but also boost up the accuracy. 
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Fig. 7. Accuracy and performance of page detection of printed reference materials 


In addition, the experiments also show that our method outperforms other algo- 
rithms with all common local features (c.f. Figure 7.B). On average, using SIFT, our 
method is 10.3 times faster than conventional method with no filtering out keypoints. 
Similarly, using BRIEF and SURF, our method is 11 and 15 times faster, and espe- 
cially that of using BRISK features is more than 19.4 times. 

Overall, our approach does not only boost up the running time up 19.4 times but 


also increases the accuracy of recognizing magazines to 22%. This is the crucial crite- 
ria for real-time AR system for magazines, books, and newspapers. 


6 Potential Usage of Proposed System 


For each course in a specific teaching environment, it is necessary to identify which 
types of augmented contents are required by end-users, i.e. educators and learners. 
Therefore, we conduct surveys to evaluate the practical need for our system in en- 
hancing the enthusiasm and attractiveness for learners, including high school students 
and undergraduate students. 

In the meeting with high school teachers and students in enhancing learning expe- 
rience in Chemistry, we identify the first two main requirements for our system. The 
first is 3D visualization of chemical elements, substances, atoms, molecules, and stoi- 
chiometry. The second is to assist teachers in the visualization and simulation for 
chemical reactions. Although no activities have been set up for students, it is a new 
teaching activity with the assistance of our smart educational environment via AR. 

In the meeting with instructors of the two courses on Introduction to Information 
Technology | and 2, we identify more interesting augmented contents including mul- 
timedia/social media data and augmented activities that can be established via our 
system. These two courses aim to provide the overview on different aspects in Infor- 
mation Technology for freshmen as a preparation and guidance for students following 
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the teaching strategy of Conceive - Design - Implementation - Operation (CDIO). 
With the assistance of our system, we can deploy the trial teaching environment for 
freshmen volunteers to join the active learning activities with AR interactions. The 
volunteers use the proposed system in the two courses. Students are assigned to read 
printed materials with AR media, to discuss and do exercises with others via our sys- 
tem. We collect useful feedbacks from participants to evaluate the usefulness and 
convenience of our system as well as the satisfaction of volunteers with the system 
and favorite functions. Based on the qualitative interviews in this study, most students 
find that our system can provide a more interesting and attractive way to study than 
traditional approaches do. Moreover, the features of collaboration in our system suc- 
cessfully attract students’ interest and trigger their motivation in reading documents. 


7 Conclusion and Future Work 


The authors propose a new method for organizing a collaborative class using AR and 
interaction. Via our proposed system, learners and educators can actively interact with 
others. Learners can do exercises embedded virtually as augmented contents linked to 
a printed lecture note. They can add a new virtual note or comment to a specific part 
of a printed lecture and share with others as well. Besides, educators get feedbacks 
from learners on the content and activities designed and linked to a specific page in a 
lecture note or textbook to improve the quality of lecture designs. Educators can also 
keep track of the learning progress of each individual or each group of learners. 

In our proposed system, we focus on providing the natural means of interactions 
for users. The system can recognize the context, i.e. which section of a page in a lec- 
ture note or a textbook is being read, by natural images, not artificial markers. Users 
can also interact with related augmented contents with their bare hands. 

We also propose a new method based on saliency metric to quickly eliminate irre- 
levant regions in a page of a book or printed material to enhance the accuracy and 
performance of the context aware process on mobile devices or AR glasses. Further- 
more, our method works independently of the training and detecting stage. It is com- 
patible to most well-known local features. Therefore, this stage can be incorporated 
into any existed system for printed material detection and recognition. 

There are more saliency metrics for implementation in our visual search engine, 
thus requires further experiments. In addition, the authors are interested in applying 
psychology and neuroscience knowledge of human vision in further research. To 
enhance the system, we are doing classification by Neuron network algorithm to 
analysis the profiles and learn their behaviors in order to utilize better. 
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Abstract. A recent boom has been seen in 3D virtual worlds for entertainment, 
and this in turn has led to a surge of interest in their educational applications. 
Although booming development has been seen, most of them only strengthen 
the traditional teaching methods using a new platform without changing the na- 
ture of how to teach and learn. Modern computer science technology should be 
applied in STEM education for the purpose of rising learning efficiency and in- 
terests. In this paper, we focus on the reasoning, design, and implementation of 
a 3D virtual learning system that merges STEM experiments into virtual labora- 
tory and brings entertainment to knowledge learning. An advanced hand gesture 
interface was introduced to enable flexible manipulation on virtual objects with 
two hands. The recognition ability of single hand grasping-moving-rotating ac- 
tivity (SH-GMR) allows single hand to move and rotate a virtual object at the 
same time. We implemented several virtual experiments in the VR environment 
to demonstrate to the public that the proposed system is a powerful tool for 
STEM education. The benefits of this system are evaluated followed by two vir- 
tual experiments in STEM field. 


Keywords: 3D virtual learning, Human machine interface (HCD, hand gesture 
interaction, single hand grasping-moving-rotating (SH-GMR), STEM education. 


1 Introduction 


Digital virtual worlds have been used in education for a number of years, as the com- 
mon use of which, the issues of providing effective support for teaching and learning 
arouse continuing discussions. Considerable limitation exists in the modern pedago- 
gies and their practices. According to Mohan, “students are presented with the final 
results of knowledge but not with data and experience to think about” [1]. Many 
current educational tools, instead of enhancing the notions of interaction and student- 
centered learning, only strengthen the traditional teaching methods using a new 
platform without changing the nature of how to teach and learn. The functions of 
computers in online learning environments and the existing interactive systems are far 
from what is desirable. Examination based teaching assessment, although widely 
being used, always has difficulty in revealing teaching effect and instructing the suc- 
ceeding action of teachers and students [2]. If technology can give timely feedback to 
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players in the process of learning activity, learners would timely adjust their under- 
standing, behaviors, and implementation. Computer games are much attractive to 
young people, which provide a possible breakthrough to push forward new instructive 
technologies. Playing games in virtual worlds with educational purposes, students are 
not only able to learn knowledge, but also explore new experiments on their own [3]. 
The fast development of Internet-based communication technologies, e.g. online vid- 
eo chat, portable devices, mobile platforms and cloud computation, allows instructors 
and learners to be connected and get access to knowledge anytime and anywhere such 
that they can have "face-to-face" talk and "hands-on" educations even if they are not 
in the same classroom. 

A recent boom has been seen in 3D virtual worlds for entertainment, and this in 
turn has led to a surge of interest in their educational applications. We will focus on 
the reasoning and the design of a 3D virtual learning system that merges real experi- 
ments in virtual laboratory and brings entertainment to knowledge learning. The de- 
sign of the system aims at improving teaching and learning efficiency and interest by 
introducing advanced human machine interface and VR interactive teaching software 
into classroom and online learning. We will discuss the benefits of applying hand 
gesture interface and VR environment into e-learning and also give a design of the 
system with two examples. 


2 Benefit Analysis of 3D Virtual Learning 


Our advanced hand gesture interaction and the VR environment described above per- 
fectly meet the demand of the purpose of the online virtual learning and training. The 
benefits of the system are: 


Empowerment. Pan [4] stated that VR is an empowerment technique that opens 
many new path for learning. VR-based learning provides a paradigm shift from old 
pedagogies since it provides interaction with all human sense, such as vision, sound, 
and even touches, taste and smell [5]. The VR interactive teaching system focuses on 
students, their learning motivation and learning practice. Instead of receiving input 
from teachers all the time, students are able to control their own learning by manipu- 
lating learning materials and practicing in or out of classroom. Even though student- 
centered teaching has been advocated for ages, the fact is many teachers find it hard 
to shift and transfer their power to students, not to say there are teachers who are 
not aware of their current roles. One of the reasons for the hard shift is the lack of 
creative and friendly learning environments, in which activity and practice play an 
indispensable role. 


Learning by Doing. We all know that when learning new and abstract concepts, e.g. 
global warming, sound transmission, magnet, etc. we find it is hard to understand 
without connecting to a concrete example. Things could be different when students 
have something that they can see and manipulate in front of them because it helps 
them connect abstract concepts with concrete experiences. Furthermore, they are 
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provided with more practice and knowledge through the exploration in a 3D interac- 
tive system. For instance, in the study of heat transformation, students are not only 
able to understand the concept of it, but also able to get to know under what circums- 
tances and structure and through what types of objects that heat can be transferred. 


Sustaining the Learning Interest. One of the benefits of game-like educational 
technology is to motivate learners and sustain their learning interest by letting learners 
control their world of knowledge/game, interact, explore and experience it. The inte- 
raction with the system is a key for learners to build more on their previous know- 
ledge, though they might go through many trials of failure before they can move on to 
the next step. In the process of practicing, learners can modify their solutions to 
achieve the best performance required for the law in STEM areas. 


Better Teaching Performance. Teaching and learning are mutual process [6]. The 
introduction and application of 3D interactive system is not to decrease the signific- 
ance of teachers but to help teachers better their teaching by using technology into 
classroom. Compared to words, visual products such as videos and animation carry 
much more information. Therefore, a combination of words and animations could 
enhance the amount of output of information and knowledge. More importantly, 
teachers are able to explain topics by connecting concrete concepts as with real world, 
by motivating students and sustain their interests. 


Materializing Abstract Knowledge. There are many abstract concepts, models, 
processes, methods existing in STEM field need to be showed dynamically and sur- 
realistically. Illustrating only by descriptive characters and figures may not be enough 
to give students whole pictures of them. An application aiming at explaining the con- 
cept of food chain, asks players to initialize the quality of grass, rabbit, and fox. Run- 
ning this game, the qualitative relationship among these creatures shows on the screen 
in a dynamic way, forcing the players to consider the environment in an equilibrious 
way to achieve ecological balance. The system can also display invisible matter in a 
visible way, such as energy and force. 


Real Time Assessment. The traditional classroom teaching is not able to give imme- 
diate feedback information about the learning and performance of students because 
teachers are not able to stay around students watching all the time. Formative assess- 
ments, summative assessments, and official assessments are three typical techniques 
used for classroom assessments [7]. But these methods do not provide specific and 
timely information about student learning. They are always slow respond, biased, and 
limited by test writers. The paper-and-pencil test that is the most common use as- 
sessment is more the enhancement of lower level cognitive behavior rather than that 
of higher level cognitive behavior, according to Bloom’s Taxonomy classification. 
This problem is improved by adding an intelligent assessment agent module in the 
system. Running as a background agent, it monitors learners operation in real time 
and makes assessment about whether instructive objectives are reached. 
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Improved Online Communication and Cooperation Experience by Cloud Com- 
puting. The most advantage of cloud computing is that information and services can 
be accessed anytime and anywhere by different platforms. Users are always able to 
get access to educational applications, personal information, performance assessments 
and real time communication from instructors or others. These will greatly lower the 
learning cost and boost the flexibility, which is very suitable for online learning. Stu- 
dents' learning process, special needs, assessment, etc, can be stored in cloud and then 
checked by instructors. Moreover, cloud computing provides a platform for conve- 
nient communication, collaboration, team-building and group-centered project. 


3 Design of the Hand Gesture Interface 


According to the need of e-learning and e-business, we design an efficient and low- 
cost human computer interaction interface. Our proposed hand gesture interface is 
designed to recognize two hands movements, hand poses (open hand and closed 
hand), and single hand rotations. There is no need to extract individual finger move- 
ments in this case. Also, we properly allocate the stereo camera to prevent it from 
dealing with complex situations. Moreover, we carefully design the gestures of appli- 
cations so that hand overlapping is not necessary. 

As shown in Figure 1(a), the stereo camera is placed on top of the computer screen 
and tilts down to capture the hands that are placed right above the table. This ar- 
rangement prevents from capturing human heads in the view. Also, the reflection of 
ambient light on hand is mostly uniform, which reduces the recognition error caused 
by shadow changes. In addition, it is suitable for long time operation because users' 
hands are well supported by the table. Users' hands are free to move in horizontal (x), 
vertical (y) and depth (z) direction, and rotate in yaw, pitch and roll. 


(a) (b) (c) 


Fig. 1. The design of hand gesture interaction. (a) hardware configuration, (b) left camera view, 
(c) right camera view. 


One of our contributions to hand gesture interaction is that the system is capable of 
recognizing single hand grasping-moving-rotating (SH-GMR) activity. Compared 
with traditional two hand "steering wheel" [8] gesture for rotating a virtual object, a 
hand gesture interface with single hand rotation integrated is able to fully control an 
object [9, 10]. Figure 2 illustrates the SH-GMR activity by an example. SH-GMR 
contains three major actions: preparing, grasping, and moving and rotating. 
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A human hand changes its shape from open-handed status (a hand with fingers 
stretched) to a grasping posture such that an object is captured and fully controlled. 
Moving and rotating action may occur simultaneously or independently. Keeping the 
same grasping gesture, the hand shifts or rotates so that the virtual object is shifted 
and rotated correspondingly. The hand changes its shape back to the open-handed 
posture, thus releasing the virtual object from being controlled. Compared with the 
traditional "steering wheel" gestures for object rotating, this method naturally maps 
hand gestures in the real world to the 3D virtual space. 


(a) (b) (c) 


Fig. 2. Illustration of the SH-GMR activity, (a) initial posture, (b) grasping action, (c) moving 
and rotating actions 


In our design, all of the icons, objects, and shapes are treated as physical objects and 
can be interacted with very natural hand gestures. Users manipulate objects by com- 
mon sense, not by memorizing a bunch of hand gestures. Only two poses (open and 
closed) are needed to be discriminated, which allows a wide tolerance range for users' 
real postures. 

Figure 3 shows the diagram of the whole system that we design. The input sensor is the 
calibrated stereo camera. Hand parameters, including positions, status, rotation angles, etc, 
are extracted from the hand gesture interface module for each frame. The VR environment 
in the e-learning module reacts to the gesture input with dynamic information. 


Stereo « 
calibration uated Cloud computing 
rectification t 
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© Segmentation SH-GMR 
(FRD) L_ 
ie 7 
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Vv environment 3 
oO 
Stereo Hand postures Hand gesture i + 
camera interface E-education 


Hand parameters (positions, status, angles, etc) 


Fig. 3. The system diagram of the e-learning and e-business using the proposed hand gesture 


interface 
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4 Implementation of the 3D Virtual Learning System 


4.1 Stereo Camera 


Considering that the two applications are e-learning and e-business, mature and low- 
cost stereo imaging technology should be used. Webcams are chosen to be the image 
sensors for our system. Two high quality webcams with VGA resolution and 30fps 
frame rate are physically aligned and fixed on a metal base. They can be easily 
mounted on computer screens or tripods. The physical alignment makes the optic axis 
of the two cameras parallel and pointing to the same direction. Due to the manufactur- 
ing defects and the imperfect alignment, the output images should be undistorted and 
rectified before they are used to extract depth information. 


4.2. Camera Calibration and Rectification 


Single camera checkerboard calibrations are implemented for both left and right cam- 
eras. We use Heikkila and Silven's [11] camera model that takes focal points and 
principal points as the camera intrinsic parameters. Lens distortion, including radial 
distortion and tangential distortion, are described by 5 parameters. 16 different check- 
erboard images are taken to guarantee a robust estimation of the camera parameters. 
Then, the stereo calibration estimates the translation vector T and rotation vector R 
characterizing the relative position of the right camera with respect to the left camera 
(reference camera). 

With the intrinsic parameters, an undistortion process [11] is applied on each cam- 
era in each frame to suppress tangential and radial distortion. To simplify the compu- 
tation of pixel correspondence, two image planes need to be rectified first. A. Fusiello 
et al. [12] proposed a rectification procedure that includes image plane rotation, prin- 


cipal point adjustment and focal length adjustment. Let m = [u v if be the ho- 
mogeneous coordinates of pixels on the right camera’s image plane. The transforma- 
tion of the right camera’s image plane are m”” = (K,,.R, \(K, R, y" m“, where 


new 


and m”” are the homogeneous coordinates of pixels on the right camera’s image 
plane before and after rectification, Ris an identity matrix, and Ris the rotation 


matrix of the camera before the rotation. 


4.3. Hand Gesture Recognition 


For the purpose of generating skin color statistics, luminance and chrominance 
need to be separated. We convert the image sequence from RGB color space to 
YCbCr [13] by: 


Y =0.299R + 0.587G +0.114B 
C,=R-Y 
C,=B-Y (1) 
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where, Y is the luminance component, and Cb and Cr are the chrominance compo- 
nents. This color space conversion has to be done on both left and right cameras. 

Color-based segmentation is used to discriminate hands from their background. 
S. L. Phung and et al. [14] proved that Bayesian classifier performs better compared 
to linear classifier and Gaussian single and mixture models. Whether a pixel is con- 
sidered as a skin pixel is decided by a threshold t: 


P(X 1@)) ae 
p(X 1a,) (2) 


where @ , and w, denote skin color and non-skin color, p(X |1@,) and p(X |a,) 


are the conditional probability density functions of skin and non-skin colors. A color 
calibration procedure is needed when users first use the system. Users are asked 
to wave their hands in the camera view so that the training data of the skin color can 
be acquired. With this, the system is able to adaptively learn users' skin color as well 
as lighting conditions. 

We want to discriminate hand in open and closed poses by learning the geometrical 
features extracted from hands. A contour retrieving algorithm is applied to topologi- 
cally extract all possible contours in the segmented images. We empirically use the 
two largest segmented areas as hand segmentations because normally two hands are 
the largest skin color areas in the view. A convex hull and its vertex set are computed 
[15]. The number of vertex after a polygon approximation procedure should be in the 
range of 8 to 15 considering both computational cost and accuracy. Several features 
can be extracted from the convexity: the distance between the starting point A and the 
ending point B of each defect, and the distance between depth points C and the far- 
thest points on hand D. Distance lap and Icp fully describe the situation of two adja- 
cent fingers. 

To help determinate the open hand and closed hand poses, we train a classifier us- 
ing the Cambridge Hand Gesture Dataset [16]. The reason is that the image in the 
dataset has the similar camera position with ours, and the dataset provides sequences 
of hand actions that are suitable for learning hand dynamics. We select 182 images 
from the dataset and manually label them with wo (open hand) and wy, (closed 
hand). For each image, we extract lag and Ic¢p distance from all convexity defects of 
the hand. The training vector is described as {L, w;}, where L is the set of lag and 
Icp distance in a hand. A support vector machine is trained on the resulting 14- 
dimensional descriptor vectors. Radial basis function is used as the kernel function to 
nonlinearly map the vectors to higher dimension so that linear hyper plane can be 
decided. 

Since there is no need to track single finger movements, positions of hands on both 
camera views are decided by two coordinates: (x ,, yy) and (Xp, Yr). The coordi- 
nate of one hand on each camera view is calculated by the center of gravity of the 
hand segment. This will smooth the vibration caused by the segmentation. After the 
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image rectification, we have yy, = yp. The disparity along x direction is computed by 
d = x; — Xp. The depth z of the point is given by: 
fT 


z= 


d (3) 


where fis the focal length, T is the baseline of the stereo camera. Note that the unit 
in equation (3) is in pixel. 

Existing hand interaction is highly limited by the current two-hand rotation gesture 
due to the lack of the research on hand fist kinematics. A single fist rotation detector 
(FRD) is crucial to implement the SH-GMR activity that makes possible control of 
different objects by two hands simultaneously. With this concern, a feature-based 
FRD was proposed to extract robust and accurate fist rotation angle [9]. The features 
we find on fists are called "fist lines" which are 3 clearly dark lines between index, 
middle, ring and pinky fingers. 

The FRD is a three-step approach. The first step is fist shape segmentation locating 
single fist in a search window. A clustering process is used to decide the fist position 
along human arms. The second step finds rough rotation angles with histograms of 
feature gradients using Laplacian of Gaussian (LOG), and then refines the angles to 
higher accuracy within (—90°,90°) with constrained multiple linear regression. The 
third step decides the angle within (—360°, 360°) by making use of the distribution 
of other edge features on the fist. 


5 Benefit Evaluation with two Examples 


We implemented two simple virtual science experiments to demonstrate the im- 
provement. Figure 4(a) shows an analog circuit experiment that help students learn 
how to measure electrical quantities with a multimeter. In the virtual environment, a 
student is able to turn on the multimeter and twist the dial plate to a right setting with 
single hand operation. Then, the student drags both probes to connect to the resistor 
with two hands operation. If the setting and the connection is correct, the resistance 
value can be read from the screen of the multimeter. In the circuit experiment, all 
electronic components are listed in a virtual toolbox. Students are allowed to take out 
demanded objects from the toolbox, and make circuits in the space. 

Figure 4(b) shows a virtual environment for implementing chemical experiments. 
Kinds of experiment equipments are placed on the table in the space, including beak- 
ers, test tubes, flasks, alcohol lamps, etc. Different chemicals can be found in virtual 
containers. The text descriptions of the chemical compositions are popped out if users 
put hands on them. The figure shows a user is holding a flask containing certain 
chemical liquid on his right hand and a breaker containing another chemical power on 
the left hand. He is pouring the liquid from the flask to the breaker to trigger certain 
chemical reaction. The shifting and moving of an object is fully controlled by one 
hand. The chemical reaction can be displayed in the form of color changes, anima- 
tions, sound effects, etc, to give the user the feedback of his operations. 
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Fig. 4. Simple applications of the virtual learning systems. (a) Multimeter (b) chemical 
experiment. 


6 Conclusion 


The objective of this paper is to boost online teaching and learning efficiency as well 
as interests with modern computer science technologies, a 3D virtual learning system 
for STEM education. In the proposed system, students are able to carry out virtual 
STEM experiments with advanced hand gesture interface and VR environment. The 
theoretical reasoning and two examples above illustrate the improvement from current 
e-learning paradigms. Fully functioned online education systems that aim at particular 
grades and disciplines are urgently needed. Future research should focus on the usa- 
bility study of the more applications for a better understanding of their benefits. 
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1 


Nowadays there is great interest in hobby electronics and DIY. Open source pro- 
gramming languages and IDEs such as Arduino [1] and Processing [2], are in wide- 
spread use. With the use of a solderless breadboard, soldering is no longer required to 
build electronic circuits for these purpose. The hobby electronics and DIY environ- 
ments have become easy and accessible by spreading of instruction on web. Moreo- 
ver, there are many communities and people interested in DIY and hobby electronics 
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Abstract. This paper reports a new system for prototyping circuits called the 
Visible Breadboard. The Visible Breadboard is a solderless breadboard that al- 
lows users to make or erase physical wirings with tangible input by hand and to 
see the voltage level of each hole at all times by a colored LED light. 

The Visible Breadboard has 60 solid-state relays set in parallel crosses and 
controlled by a micro-controller. These relays connect the 36 holes on the sys- 
tem surface. The connected holes work as wirings in the circuit into which users 
can insert electronic materials. Each hole has an AD converter function working 
as a voltmeter and a full-color LED. The voltage of each hole can be visualized 
by these full-colored LEDs. Users can operate this system by touching the sur- 
face with their fingertips. Users can also connect the Visible Breadboard to a 
PC. When the Visible Breadboard is connected to the PC, it functions as a new 
kind of interface for developing and sharing circuits. 

Our experimental results showed that this device enables users to build cir- 
cuits faster and more easily than an ordinary solderless breadboard. 


Keywords: Rapid Prototyping, Physical Computing, HCI. 


Introduction 


on the internet and their communications are seen on SNS. 


Learning electronics and having a knowledge of electronic circuits help people 
understand what energy and system are. When we make a circuit, it is very easy to 
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understand power consumption, for example, comparing an actuator that is very large 
to an LED that is small. 

Many articles and tips regarding DIY and hobby electronics are available on the 
web. In these communities, it is easy to share schematic diagrams and pictures of 
circuits, but very difficult to share actual circuits. 

Furthermore, although a great deal of information about hobby electronics and DIY 
is available, it is still difficult to understand what happens in the actual circuits and to 
share these circuits. IDEs have become more workable, but building a circuit is still 
manual work and difficult for a beginner. In the real world, we cannot see voltages in 
the circuit and are not able to “UNDO” our actions. 

When we want to know the voltage in a circuit, we use a voltmeter. It shows us the 
voltage by analog meter or LCD display, but does not visualize the voltage on the 
circuit. It is also difficult to measure multi-points in the circuit at the same time. 
Moreover, when we teach electronics in the classroom or workshop, it is difficult to 
share a circuit. There are often mistakes made copying a circuit. 

To address these problems, we have developed a new system, which we named the 
Visible Breadboard (Fig. 1). The Visible Breadboard has dynamic circuit connections 
for each hole using solid state relays and visualizes the voltage on the surface with a 
colored light. 

From the viewpoint of programming materials [3], circuits made of solid state re- 
lays can be defined as “real programming material” that change physical characteris- 
tics through programming and using electricity. 

This research paper will first cite some of the related researches and discuss the 
reason our research is relevant in section 2. Secondly, it will explain the implementa- 
tion and function of device in section 3 (hardware), 4 (middleware), and 5 (software). 
Thirdly, it will show some of the conducted experiments in section 6. After that it will 
discuss the limitations and experimental results in section 7. Lastly, we will conclude 
with possible future work in section 8. 


REAL & PHYSICAL CIRCUIT 


A é a 7 = A Holes to insert 
7 cama , 


electrical materials 


= ———————— capacitancen 
> . we) touch panneLwith holes 


PROGRAMABILITY 


2 66 29 66 


Fig. 1. (left) concept picture shows “real circuit”, “visible voltage”, “circuit programability”, 
and “share” for prototyping purpose. (right)system overview 
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2 Related Work 
2.1. Tangible Prototyping Tools 


Many tangible devices for prototyping have been developed, such as the Algoblock, a 
development of Suzuki and Kato [4]. Algoblock is a tangible programming language, 
an advanced research, aimed at collaboration and visualization. It is good for colla- 
borative learning and programing but it is not aimed at prototyping a circuit. 

There are tangible prototyping tools for electronics, such as react3D Electricity [5] 
and Denshi blocks [6]. react3D Electricity is a tangible [7] circuit simulator but users 
cannot use real electronic materials. A Denshi block is a physical block that contains 
an electronic component. Users can make a circuit with these blocks but it is difficult 
to add other electronic materials, which are not included in the Denshi block package. 

Research has been done on visualization of power consumption such as Flo [8]. As 
a visualization tool, however, it does not target hobby electronics or DIY. It is a tool 
for monitoring electricity in the home. 

While there has been significant advanced research on prototyping tools, there is 
no research on tangible prototyping tools using real electronic materials. 


2.2 Position of Study 


Here we show the position of this study in Table 1. This study focused on the user 
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experience with “real circuit”, “visualization of voltage’, and “circuit programability”. 


Table 1. Position of this study 


Circuit Programability 


Available Not available 


This Research* 


Real 


Denshi Block (6\inondigital Ordinary Solderless Breadboard 
enshi Block [6)(nondigital) 


Cirtuit 


react3D electricity* [5] 


Virtual 


“Visibility on electricity 


3 Hardware Implementation 


This section describes the composition of our system in 3.1. Following which we 
describe each modules by processing order. 


3.1.‘ Visible Breadboard System Prototype 


The Visible Breadboard Prototype system is composed of two sub-systems. The first 
is a device that functions as a solderless breadboard. In this paper, we call this the 
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Visible Breadboard Device. The second is the software for a personal computer. We 
call this the Visible Breadboard Software. We describe the Visible Breadboard Soft- 
ware in Software section 5. 


3.2 Visible Breadboard Device 


From a functional viewpoint, the Visible Breadboard Device is separated into four 
modules: sensor board with holes, solid state relay board, voltage sensing board, and 
full color LED board. The connections for each module are shown in Fig. 2. These 
four modules are assembled into three physical layers. 


3.3 
: 


Capacitance change) 


components 
Top board 


_ LED board 


Making 
circuit 


3.5 3.4 


Fig. 2. System diagram and the connection for each four modules.(side view). Visible Bread- 
board device has the structure of three layers: Sensor board, LED board, and system board. 


The Visible Breadboard Device is controlled by a single micro-controller (Arduino 
MEGA with ATMEGA 1280) 


3.3. Sensor Board with Hole 


This module detects a user’s finger position on the surface by sensing a capacitance 
change (like SmartSkin [9]). A finger touch changes the capacitance of the metal 
pads, which are set on the surface of the top board. This module enables users to 
input by finger touching and tracing the surface. Four pads seen around the hole are 
connected horizontally and vertically, enabling the system to detect a coordinate 
position of the finger, as shown by the arrows in Fig. 3. This module looks for a 
capacitance change by cross-sensing: vertical and horizontal position sensing. Each 
hole on this board is connected to both the solid state relay and voltage sensing 
module. 
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3.4 Solid State Relay Board 


The Solid State Relay Board is placed in the bottom physical layer of the Visible 
Breadboard Device (Fig. 2). This module (Fig. 4) has 60 solid state relays placed 
between the 36 holes connected to the Sensor Board. By switching these solid state 
relays ON / OFF, the connection status of the holes is changed dynamically. Active 
solid state relays form the wiring between the holes and complete a circuit with the 
electronic materials inserted into the Sensor Board. 


Touch pad 


Cross sensing. 
Detect the position of the finger. 


Fig. 3. Top view of Visible Breadboard device 


Circuitized line 
made by fingertouth on the upper board. 


Solid state relay 


Fig. 4. SSR board on bottom layer of Visible Breadboard device 


These solid state relays are controlled by eight shift registers. The shift registers are 
controlled by a micro controller with serial connections. 


3.5 Drive Sensing Voltage of Each Hole 


This module (Fig. 5) measures the voltage value of the 36 points. The drive control of the 
solid state relays enables switching the connection of the micro-controller AD converter 
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Fig. 5. Drive sensing module by SSR in bottom layer. ( close-up ) 


Light up with gradation color determined by the voltage value. 


Fig. 6. LED displaying modules ( with opening top board) 


and the holes of the Sensor Board into which the users inserts the electronic materials. 
Though this module can measure only six points at once, we think that it is sufficient for 
this device because the full-color LED system (showing the voltage in color) in this de- 
vice is lighted by the drive control system (six LEDs at once). 


3.6 Displaying The Voltage 


User can choose the “start voltage color” and the “end voltage color” in the configura- 
tion mode. This module (Fig. 6) changes its LED color by gradation corresponding to 
the voltage value (Fig. 7). The color is determined by Equation (1). 


C= 


Re-Rs Ge-Gs Be-Bs 
ir * @ L 
C = color of full color LED, v = voltage (1) 
(Rs,Gs,Bs):start color (Re,Ge,Be):end color 


L = ADConverter’ s maximum value 
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3.7. Hardware Specification of the Visible Breadboard Device 


The hardware specification of the Visible Breadboard Device is shown in Table 2. We 
developed four Visible Breadboard Devices. They can be connected to each other to 
make 144 holes available. There is a resistance value between the holes because this 
device uses Solid State Relays for the wiring. However, this is an allowable margin 
error for the circuits that can be built on this hardware. The distance between two 
holes is approximately 12 times larger than found on an ordinary solderless bread- 
board. This is large, but allows users to use extensions (Fig.7) for small electronic 
materials or IC chips. 


Table 2. Hardware Specification 


weight 1200g 
size 24.2cm (W) 24.2cm (L) 6cm (H) 
number of holes 36holes 
distance between 2holes 3.2cm 
voltage sampling frequency 30Hz 
AD Converter 1024 steps (0- 5V) 
LED Color steps RGB Color 8096 steps 
Micro-controler Atmegal028 (ArduinoMEGA) 
Clock 16MHz 
Input Voltage 0-5V 
with Optional OPAMP 
max voltage 48V 
Resistance value between holes 20. 


Fig. 7. Parts extension for Visible Breadboard device ( wire with socket ) 
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4 Middleware Implementation 


The Visible Breadboard Device has several features for interaction with users. The 
following sections describe basic wiring and visualization, checking the connection 
function, UNDO/REDO, and digital color mode. 


4.1 Basic Wiring and Visualization 


Users can control the Visible Breadboard Device by touching the pads on the surface. 
This section describes the connection algorithm. If users touch one hole (Fig. 8 
(left).1) and then another next to it (Fig. 8 (left).2), these two holes are connected or 
cut (Fig. 8 (left).3). If a hole is connected to others, the LED for the voltage visualiza- 
tion (placed below the sensor board) turns on. For example, users can make the circuit 
shown in Fig. 8 (right). 


Example 1: Fig. 8 (right-up) is a circuit with resistors, an LED, and a capacitor. The 
RED light shows that the voltage is Vcc (5 V) and the BLUE light shows that the vol- 
tage is GND (0 V). In Fig. 10, LED (A) is on because there is an electric current in 
the circuit. 


Example 2: Fig. 8 (right-bottom) is a circuit with resistors only. The RED light shows 
that the voltage is GND (0 V) and the BLUE light shows that the voltage is Vcc (5 V). 
In Fig. 11, the holes on each side of interval (A) have different colors because there is 
no connection (left hole shows GND and right hole shows Vcc). 


Fig. 8. (left) Connection Algorithm (right-up) Example 1: RED is V+, BLUE is GND (right- 
bottom) Example 2: RED is GND, BLUE is V+ 
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4.2 “Checking the Connection” Function 


The Visible Breadboard Device has an indicator for voltage visualization but there is 
no indicator for the connection of the holes. To compensate for this, the Visible 
Breadboard Device has a “checking the connection” function. When users press the 
button on the right side of the sensor board, the voltage visualization LEDs blink in 
sequence to show the connection of each hole. 


4.3 UNDO/REDO 


The Visible Breadboard Device has a wiring UNDO and REDO function. Using this, 
users can erase and remake a connection easily. Furthermore, if the users push the 
UNDO and REDO button repeatedly, the wiring in the circuit repeats the cutting and 
connecting rapidly. Users can make a high-speed voltage change. This is useful to 
view the waves on the breadboard. 

Additionally, if the users push the REDO button slowly, they can see, slowly and 
in order, what happens in the circuit when the connection is made. 


4.4 Digital Color Mode 


The Visible Breadboard Device has two visualization modes. The first is a basic visu- 
alization using gradation of the LED color, like thermography. The second is a “digi- 
tal color mode”. In this mode, the LED shows a different color for every 1V step. It is 
helpful for checking and understanding what happens in the digital circuit. 


4.5 Other Features 


The Visible Breadboard Device has some other features: “Sound”, “Auto Save”, and 
“Color palette for the voltage”. The Sound system gives users visual and sound feed- 
back. The Auto Save system enables users to emulate an ordinary solderless bread- 
board. With an ordinary solderless breadboard, the circuit remains connected without 
electricity. With the Visible Breadboard, the circuit that users develop disappears if 
the power is turned off. The Auto Save features compensates for this shortcoming. 
With Color palette for the voltage, users can set their preferred color configuration. 


5 Software Implementation on PC 


The Visible Breadboard Software, which runs on the PC, has several utilities for the 
Visible Breadboard Device. This section describes the “Voltage visualization on the 
PC” and “Copy the data from device” functions. 


5.1. Voltage Visualization on PC 


The Visible Breadboard Software has a voltage visualization feature (Fig. 9). When 
the Visible Breadboard Device is connected to a PC via a USB cable, data from the 


82 Y. Ochiai 


AD Converter are sent to the PC. The software captures the voltage data of the 36 
holes on the Visible Breadboard Device and shows them as a 3D bar graph. The col- 
ors on the bar graph change corresponding to the color settings of the Visible Bread- 
board Device’s voltage visualization. 


tasks \ users J AVG 


Fig. 9. Voltage visualization on PC. Also user can share the circuit built with this device via 
internet. 


5.2 Circuit Share 


The Visible Breadboard Software can capture all the data (connection data of holes, 
visualization data, and settings) from the Visible Breadboard Device. This software 
can import and export these data as files. It enables users to copy and share the actual 
physical circuit via internet or email. 


6 Evaluation 


6.1 Experimental Design and Participants 


We ran a task-oriented circuit-building test with ten users, nine male and one female, 
with average age 20.8. 

Before starting the test, we explained, for two minutes, how to use an ordinary sol- 
derless breadboard and the Visible Breadboard. We let the participants practice for a 
few minutes on each. Then all participants knew how to use both systems. 

For the first test, users made 25 connections on the Visible Breadboard. This was 
to verify the capacitance sensor and resistance of the participant’s finger skin. 


6.2 Questionare 


After the wiring test, there were eight tasks relating to making a circuit. We tested 
four kinds of circuit ((a), (b), (c), and (d)) and users made these on both the ordinary 
solderless breadboard and the Visible Breadboard. For (a) and (b), we showed the 
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participants a picture of the correct circuits and the users made the circuits from the 
picture. In experiments (c) and (d), we showed the participants a schematic. They then 
made the circuits on both the ordinary solderless breadboard and the Visible Breadboard. 


Table 3. Experimental Result of 10 users on wiring task (unit: second) 


Make Wiring on 12 16 14 12 15 10 11 10.8 30 10.9 14.17 
Visible Breadboard 


(a) Breadboard 74 214 112 71 152 84 53 155 98 51 106.4 


(a) Visible Bread- 50 60 49 29 49 57 23 36 51 24 42.8 
board 


(b) Breadboard 102. 221 157 129 137 «106 —~=«89 183 150 45 131.9 


(b) Visible Bread- 30 45 36 35 43 65 22 66 95 57 49.4 
board. 


(c) Breadboard 115 154 101 39 62 87 26 63 77 55 77.9 


(c) Visible Bread- 37 42 37 26 32 60 15 26 31 85 39.1 
board 


(d) Breadboard 81 119 189 94 83 30 32 116 38 51 83.3 


(d) Visible Bread- 37 — 58 52 49 30 45 27 77 52 63 49 
board 


6.3 Results 


The results of the user experiments are shown in Table 2. There were ten participants, 
A to J. It is seen in the average column that for every circuit tested in this experiment, 
the Visible Breadboard was faster than the ordinary solderless breadboard. 

After the wiring experiment, we tested the voltage visualization effect of this de- 
vice. All of the people who took part in this experiment answered the question: 
“Where is the GND or Vcc?”, correctly. It was easy for people to distinguish the Vcc 
from the GND with the Visible Breadboard. 


7 Discussion 


The difference of the speed between the ordinary solderless breadboard and the Visi- 
ble Breadboard depended on the complexity of the circuit. When people used many 
bread wires to develop the circuit on the ordinary solderless breadboard, the differ- 
ence of the speed increased quickly. 

The Visible Breadboard accelerates building the circuit, enables users to share the 
circuit, and visualize the voltage in the circuit. It is useful for the first circuits that 
beginners make or children make in the school. 

We presented this system in many places including SIGGRAPH [10]. People sug- 
gested that the full-color LED display should be replaced by an LCD display because 
the drive control makes the LED display blink. We think this will be better when the 
LCD displays become lighter. 


84 Y. Ochiai 


8 Conclusion and Future work 


We would like to make this system a product and gather additional data for future 
research. This paper introduced a new kind of system for prototyping and sharing 
circuits. It makes building circuits easier and faster. Moreover, the ability to visual the 
voltage was certainly very effective for people using the device. 
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Appendix 


Visible Breadboard Instruction Movie (Video on YouTube). 


http://www. youtube.com/watch?v=nsL8t_pgPjs 
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Abstract. This paper presents the design of a service-oriented architecture to 
support dynamic cultural content acquisition on a mobile augmented reality sys- 
tem for reanimating cultural heritage. The reanimating cultural heritage system 
provides several domain interfaces (Web, Web3D, Mobile and Augmented Re- 
ality) for presenting cultural objects accessed from an aggregated RCH data re- 
pository via web services. This paper largely focuses on the augmented reality 
system, but discusses the Web, Web3D and Mobile domains to set the paper in 
context. The mobile augmented reality system performs multiple objects track- 
ing to augment digital media contents on real world cultural object scenes. The 
proposed mobile augmented reality system is composed of a mobile interface 
(smartphone, tablet), middleware including the augmented reality SDK and 
supporting software modules for the augmented reality application, and a web 
service framework. 


Keywords: service-oriented architecture, multiple object tracking, web service 
framework, augmented reality. 


1 Introduction 


Reanimating Cultural Heritage: Reanimating cultural heritage is a Beyond Text 
Large Project [1][2][3] funded by the UK Arts and Humanities Research Council. The 
resource can be viewed live at www.sierraleoneheritage.org and it currently holds 
some 3,000 plus digital cultural objects. The project’s full title is ‘Reanimating Cul- 
tural Heritage: Digital Repatriation, Knowledge Networks and Civil Society Streng- 
thening in Post-Conflict Sierra Leone’. The Reanimating Cultural Heritage (RCH) 
project is a “multidisciplinary project concerned with innovating digital curatorship in 
relation to Sierra Leonean collections dispersed in the global museumscape” [1]. The 
project is mainly concerned with establishing a digital repository and primary web 
interface that allows Sierra Leonean diaspora to access their heritage (cultural objects) 
digitally while also allowing the diaspora to contribute, through a social media 
context, their knowledge [3]. illustrates the Home page (with a two randomly 
selected media objects: in this case a video illustrating aluminium pot making, and a 
cultural object displayed in the ‘From the collection’ interface) and shows the Browse 
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interface for the digital resource, which lists all the participating museums’ collec- 
tions in a gallery interface. 

If the user clicks on an object in the Browse gallery interface they are taken to that 
objects results page, also they can do a Quick search for an object or select a more 
comprehensive search from the home page ‘Search collections’ tab, ether way the 
eventually arrive at the cultural object’s results page. 


erry 2788 collections - al 
vou i 


5. To search by keyword theme, cute, museum ane category, ey the 


1a 4 ° Gan e& 
‘fetes 
a L 


Brighton Museum and Art Gallery 


<ga= id ca > 


a |) 2 aA a || a | 


com © TPT US._. Abou Sereaeonatrkageare| Contact| + 


Fig. 1. Home page Fig. 2. Browse page 


Fig. 3. Result page showing a Test 2D Image of a Wicker Basketillustrates the 
standard web view displaying a Test 3D Wicker Basket. Note the Facebook social 
media interface, which allows the diaspora to input their knowledge to the collections, 
and the ability to display related objects in a ‘Related Objects’ gallery [3]. While the 
current live version of the RCH resource does not support 3D media objects it is rela- 
tively easy to add this functionality using an abstraction of WebGL, such as X3DOM. 
To illustrate this we have inserted a temporary test object (Test 3D Wicker Basket 
Object) into the database and included a 3D interface utilizing X3DOM, see. 

In addition to the 2D and 3D interface we have developed a mobile version of the 
RCH resource, see. Further, by clicking on the AR tab (see ) when browsing on a 
mobile (tablet or smartphone) device, it is possible to switch to an augmented reality 
view whereby the cultural object of interest can be used to trigger access to media 
contents, such as the description, metadata, videos, images, etc., or if a 3D object 
exists, this can also be displayed along with other media contents. This is discussed 
further in Section 5. 

The RCH resource has been developed using a model, view controller design pat- 
tern, which enables us to connect different views (web, mobile, 3D, AR interfaces) to 
the same data repository. Connection to the data repository is achieved via a set of 
web services discussed in section 2. 

The main focus of this paper is to discuss the application of augmented reality 
for Reanimating Cultural Heritage utilising a new service-oriented architecture 
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Test 3D Wicker Basket Object 


— Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do 
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut 
enim ad minim veniam, quis nostrud exercitation ullamco laboris 


TEST OBJECT urtherInformat ‘ nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor... 
) [more! 


Further Information 


Object: Test 3D Wicker Basket Object 
Materials: Unknown, 


Culture Group: Unknown 


Dimensions: Unknown 

Production Date: Unknown 
Associated Places: Unknown 
Associated People: Unknown 
Museum: Unknown 

Accession Number: AR.TEMP.Object 


Hide 


Fig. 3. Result page showing a Test 2D Fig. 4. 3D Model of the Test Wicker Basket 
Image of a Wicker Basket 


(web services) for accessing media contents and the ability to track multiple objects to 
trigger data access from the RCH database (via a web service) within the AR scene. 
This architecture also offers us advantages in creating a better personalisation ap- 
proach. For example, in scenarios where a user can take images of a museum’s object 
and submit these to a photogrammetry web service to generate a 3D model. That 3D 
model can then be displayed in the user’s home environment along with download 
data from the RCH repository’s result page for that object to re-create an AR based 
museum experience. 

Augmented Reality and Mobile Services: Augmented reality (AR) has become a 
widely beneficial technique for users’ to experience a different of perception of cul- 
tural objects represented with computer-generated media contents such as 3D models, 
labels, text, images, videos, etc. on real environments [4][5]. One of the current chal- 
lenges for AR technology is to implement effective AR on mobile platforms. Mobile 
AR has become a most recent development in location based services and interactive 
graphic applications that allow users to experience visualization and interaction with 
3D models or media contents on mobile devices. Currently, mobile AR has also been 
implemented efficiently in various innovative applications such as gaming, shopping 
guides, advertising, edutainment, travel guides, museum guides and medical visuali- 
zation [6]. Adapting the visualization (e.g. better integration with different view do- 
mains), tracking (better multiple object tracking), recognition, interaction (user sto- 
ries), displays and user interface techniques with real world scenes and virtual envi- 
ronments can greatly enhance these varied applications [7][8]. 

Most mobile indoor AR applications nowadays are based on stand-alone or closed 
platforms and provide users with limited amounts of data or contents on top of real 
world scenes. In addition, there is no communication channel for the AR application 
in order to download or obtain dynamic contents from other third party data sources 
in real-time [9]. Another limitation for mobile graphic applications, and application in 
general, that require virtual models is that the models have to be created and designed 
on desktop computers and then transferred to mobile devices for running or rendering 
in games or interactive media. Although new generation mobile devices can generate 
good performance 3D graphics contents, some complicated rendering tasks still re- 
quire more processing power such as digital cultural heritage scenes, 3D virtual cities 
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or complicated 3D models. Therefore, processing image-based reconstruction or 3D 
photogrammetry tasks by multiple image matching and 3D model building cannot be 
completely done on mobile devices because of the limited resources. 

Nowadays, there are some tools that enable mobile users to create and publish 
their own AR contents for indoor and outdoor environments such as Junaio, Layar 
and Aurasma. These applications enable mobile users to create AR environments 
and save them into their channels on the cloud server. Moreover, the channels can 
be accessed through an application programming interface (API) on mobile appli- 
cations, which some application also support it such as Junaio. This technique is 
useful because it allows general mobile users who don’t want to or who can’t de- 
velop mobile AR applications to have their own AR environments. However, these 
AR applications still have some restrictions because they are implemented on 
closed platforms such that a user’s AR environment can only be retrieved via the 
commercial application, i.e. you cannot reuse a Junaio environment in an Aurasma 
environment. Moreover, most commercial mobile indoor AR applications provide 
users with limited amounts of data or contents for augmenting real world scenes. 
There is no communication channel for current AR applications (e.g. Junaio, 
Layar, Aurasma or specific research application like an AR game, etc.) that allows 
them to download or obtain dynamic contents from other third party data sources 
in real-time. 

This paper offers a solution that proposes a service oriented architecture for mobile 
AR system that exploits an AR SDK, multiple object tracking, AR supporting appli- 
cation and web service framework to perform basic AR tasks and dynamic content 
acquisition by accessing the photogrammetry service or open content providers over 
mobile/wireless network. In addition, there are some beneficial AR supporting mod- 
ules that enable mobile users to utilize and manipulate acquired AR media contents on 
AR preference environments. We then look at the service-orientation of the mobile 
augmented realty part of the architecture. 
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2 Architectural Requirements 


Several key architectural requirements are proposed to enable construction of the 
novel service orientation on mobile AR platform including: 

Service Orientation on a Mobile AR Platform: The service-oriented architecture 
(SOA) entirely supports a client-server scheme over mobile/wireless network. To 
obtain more associated valuable contents and significantly increase the usability and 
functionality of the proposed mobile AR application, service orientation will be ap- 
plied on the mobile AR platform, which basically integrates a web service framework 
into a mobile AR client [12][13]. This feature could be extensively implemented in 
indoor or outdoor AR scenarios, which AR browser is an application on web service 
framework to show media contents on real environment. Examples of web services, 
which developers can easily access to generate platform independent digital contents 
including: Web Map Services, mash-up services, geospatial and social network data, 
3D models, and the Reanimating Cultural Heritage data, etc. The designed mobile AR 
client should offer advantages from being deployed on a service oriented architecture, 
which mainly provides third party open services from digital content providers that 
are currently available to clients on any platform [10][11]. 

Multiple Object Tracking: One of the basic AR tasks is object tracking used to 
track and recognize targeted reference objects. Associated contents can then be 
revealed on the real scene. In the mobile AR client, the tracking module is de- 
signed to perform markerless tracking, which require 3D object tracking so that the 
system can recognize more than one reference object in parallel. Moreover, the 
system can augment various contents of one reference object at the same time — 
this will lead to a richer AR environment in terms of media objects associated with 
the reference objects. That is, multiple objects tracking greatly enhances the inter- 
pretation of mobile AR scenarios and their environments where mobile users can 
obviously view the variety of media contents from multiple reference objects on 
the screen at the same time. Mobile AR applications can also offer some features 
for the users to manage and utilize those revealed contents, e.g. saving an AR sce- 
nario for future use. 

Middleware System and Web Service Provider: The middleware or the back- 
end system basically is the design of supporting functions working behind the mo- 
bile interface and AR SDK. The middleware and web service provider are generally 
designed to be versatile and open platform respectively so that they can be efficient- 
ly implemented in many mobile AR scenarios, which will want to obtain and utilize 
dynamic digital contents from the web service provider and allow mobile users to 
create their preferences on AR environments. The middleware system is generally 
composed of a web service framework and AR supporting modules that concurrent- 
ly work with AR SDK in order to support the usability and adaptability of acquired 
AR contents. Moreover, some modules are designed to create connections and re- 
quest for dynamic contents from the web service provider through a web service 
framework. 
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3 Service Oriented Mobile Augmented Reality Architecture 


Service oriented mobile AR architecture (SOMARA) is mainly designed to support 
content and service acquisition on a mobile AR platform. SOMARA is composed of 3 
components including: 

Mobile Client: The mobile client mostly is an application on a mobile platform 
(currently iPhone and iPad) that exploits a web service framework and service inter- 
faces from a web service provider into its framework. Thus, the mobile application 
becomes a component in the SOA. In SOMARA, the mobile client is developed on 
iOS and native development platform. The mobile AR client utilizes a high quality 
embedded camera and touch screen user interface to accomplish AR and additional 
supporting tasks. shows the structure of the mobile client and the components inside. 

Mobile interface is a front-end component in the mobile client for mainly support 
interaction between mobile users and the AR application and AR environments as 
well as supporting interaction between mobile users and displaying contents via a 
touch screen. In the SOMARA, the mobile AR client is developed on a mobile plat- 
form basis, there is a touch screen user interface, which is focused on AR tasks and 
features for mobile users to view and interrelate with digital contents being visualized 
on the screen. 

Augmented Reality SDK is open source software on native or hybrid platform de- 
signed for mobile AR application development. At the moment, there are existing AR 
SDKs available to potential AR application developers e.g. ARToolkit, Qualcomm 
and Metaio SDK. AR SDKs basically provide basic libraries to perform general AR 
tasks such as object tracking, rendering and visualization. In SOMARA, Metaio na- 
tive SDK has been exploited in the mobile AR client and native framework. The Me- 
taio native SDK will fully work with the AR application in order to track reference 
objects, create geometry, load AR contents and their features onto a real world scene 
depending on each reference object, environments and scenarios that the system is 
designed to be implemented on. Note we have adapted the tracking module to per- 
form multiple object tracking. 

Augmented Reality application is a component of the mobile AR client in the 
middleware layer designed to largely work with the Metaio native SDK to process 
some AR tasks, e.g. building geometries, visualizing contents, etc. Moreover, the AR 
application is also combined with the web service framework for efficiently request- 
ing services and receiving responses, which are dynamic contents or final outcomes 
from the web service provider. In addition, some modules in the application will work 
with Global Positioning Systems (GPS) in order to process location base AR contents, 
personalization and outdoor AR tasks. The following sections explain each module in 
the application, which is designed to support the proposed features. 

Web Service Framework: The web service framework implemented with web 
service APIs such as SOAP or REST is composed of client and server side service 
code. The web service framework simultaneously works with the AR application as a 
middleware layer for creating web service connections, sending requests and receiv- 
ing responses between the mobile client and web service provider. In the web service 
framework, there is a XML Parser and XML serialization module to process XML 
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data representing the final outcome of the web service provider. The outcome will 
then be transferred to the AR application. At the moment we are utilizing 
XMLHttpRrequest for server side response, but we plan to consider JSON as an alter- 
native to XML for transferring data from the server. 


os AR Application 
ar 
Content DB: q 


Web service framework 


Fig. 6. Mobile augmented reality client 


Web Service Provider: The web service provider on an open server side platform 
is composed of the web service framework and open digital content service providers, 
which are included into the web service provider. The web service framework offers 
service interfaces to the mobile AR client and the service connection module is used 
to communicate with integrated open service providers for processing and dynamic 
media contents. The web service provider in SOMARA designed to supply a digital 
content service, a photogrammetry service and other services, which could be third 
party content providers or any providers that their contents will be beneficially ap- 
plied into the potential scenarios. Note the RCH Cultural Objects service. 

Multiple Object Tracking: Typical AR applications are able to track and recog- 
nize only one object. In addition, the applications will present only single content on 
top of the real scene. In this architecture, the mobile AR client will perform multiple 
3D object tracking and visualize associated contents such as 3D models, billboards, 
images, videos, etc. on reference objects at the same time. presents the process of 
multiple objects tracking and associated content configuration. 
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Fig. 7. Multiple objects tracking configuration 


4 Augmented Reality Application 


The AR application is a middleware system in the mobile client that contains impor- 
tant supporting modules for extensively work with the web service framework, AR 
SDK and mobile interface. The designed modules in the AR application are the pho- 
togrammetry, digital content (e.g. the RCH cultural objects), personalization and AR 
browser request services. 

Photogrammetry Service Request: The photogrammetry service enriches the 
functionality of the mobile AR client by enabling mobile users to request for image- 
based reconstruction services. The photogrammetry service request module handles 
connections between the AR application and a web service provider for requesting 
photogrammetry services and receiving responses (i.e. a 3D model of a cultural 
object). The AR application will capture photos of an intended object and then trans- 
fer them via the web service framework to the provider. When a final model is com- 
pletely done, it will be sent back to the mobile client via the same web service. The 
final model will then be visualized and manipulated on the working scene. Note, this 
requires existing photogrammetry services, such as Autodesk 123D to adopt a web 
services approach. 
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Digital Content Request: The digital content request module is required when 
mobile users want to search for other relevant contents from the web service provider. 
Such digital contents are sent back to the mobile client to allow users to utilize them 
in the AR environment. These content providers could be the third party or existing 
open service providers that will be able to provide various kinds of digital contents so 
that mobile users can put them on their preferred environments, which can also be 
viewed on the AR browser in other conditions. 

Augmented Reality Environment Personalization: The AR environment per- 
sonalization enables mobile users to create their own interactive AR preferences on 
the working scene by selecting; manipulating and placing preferred contents and 
also locations of reference objects or real world scenes on AR environments. This 
module also allows mobile users to save created AR environments in the 
XML/JSON formats, including 3D contents provided by the museum or photo- 
grammetry service. 

Augmented Reality Browser: The AR browser presents AR preferences, which 
are saved in the user's profile in standard representation formats including XML and 
JSON. The AR browser will require object tracking and a GPS module in order to 
reveal a user's preferences on the AR browser. The browser extensively supports in- 
door and outdoor uses by tracking proposed objects or user's location so that the ap- 
plication can then provide a saved AR environment. illustrates the Test Wicker Basket 
Object with associated media contents in an AR scene using the iPhone 5 as the mo- 
bile AR interface. Here you can see a video showing how to make wicker baskets, 
the object label, a test description and a series of related objects. 


No Service = 


Test 3D Wicker Basket Object 


Fig. 8. iPhone 5 AR presentation of a Test Wicker Basket Object 
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5 Conclusions 


The SOA for multiple objects tracking AR system supports dynamic content acquisi- 
tion over a wireless or mobile network. The mobile client is composed of featured 
modules to access open services through a web service framework. Planned services 
include a photogrammetry service based on existing services, such as Autodesk 123D, 
that enables mobile users to obtain virtual models from image-based reconstruction so 
that the users are able to design what they want to visualize on the screen. Another 
aim of this architecture is to support open content utilization so that acquired contents 
can be freely selected and placed on AR environments together with other perspective 
contents, e.g. location, geographic or social media data. User preference AR environ- 
ments can then be view in other situation by augmenting reference 3D objects, mark- 
ers or markerless such as 2D images. The system is illustrated using a Reanimating 
Cultural heritage web services that access digital cultural objects from the RCH repo- 
sitory — note we use a Test Wicker Basket Object and associated 3D to illustrate. It is 
instructive to note that even large collections of cultural 3D objects are not yet main- 
stream in museum interactive environments, and the use of AR in this context is also 
rare. We feel that exploring the notion of what is effectively ‘user generated 3D con- 
tents’ in this context is worthy of further exploration. 


Future Work. Future work will include the notion of ‘crowd sourcing’ the generation 
of high quality 3D to associate with a digital heritage repository, such as RCH, so that 
eventually over time all objects on display in a museum’s gallery (virtual museum) 
could potentially have a 3D presentation online gathered through an AR application 
such as discussed in this paper. This will, however, require a ‘mind shift’ from a mu- 
seum’s perspective; they tend not to allow visitors to take photographs in the museum! 
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Abstract. This work present the results obtained from a experience performed 
with freshmen students of the Industrial Engineering degree at Las Palmas de 
Gran Canaria University aiming for improvement of their spatial abilities. The 
work linked to spatial abilities show a great lack of uniformity according to the 
adopted terminology as a consequence of different approaches, researchers’ 
field of study and the research’s scale. But all research agree on the relationship 
between a high level of spatial ability and the possibility of success in certain 
professional careers and university degrees such as engineering which is our ac- 
tual case. The pilot study described in this paper, aims to improve the Spatial 
Orientation component of spatial abilities and for this we conducted two expe- 
riences or trainings based on orienteering sports: one was performed in a real 
environment meanwhile the other took place in a virtual environment. The re- 
sults show that this component can be trained and improved in both environ- 
ments without finding any significant difference between both types of training. 


Keywords: Spatial abilities, Spatial orientation, Environmental scale, Orien- 
teering, Virtual worlds. 


1 Introduction 


Most part of our sensations, anything that we experience or learn is acquired through 
the visual system. Our world could not be understood without the graphic sketches, 
which have been drawn since prehistory, and now used for designing every product 
and service demanded by an ever increasing technological society. 

The spatial vision is understood as the ability to visualize and manipulate objects in 
our minds. It’s not just an important skill widely recognized in the engineering field, 
but it’s also highly regarded in many other fields. So, van der Geer points out that the 
spatial vision is important for succeeding in fields such as Biology, Chemistry, Ma- 
thematics and Natural Science [1]. 
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For predicting the success in university studies, some universities commonly con- 
sider academic records and physics or mathematics grades when the case belongs to 
engineering degrees. Some studies have revealed an existing correlation between 
ability and success in other engineering fields. 

One of the problems found while revising references about spatial abilities is the 
contradiction about its definition. We may found the same term for identical compo- 
nents or even different terms with the same descriptions. Besides, there is no common 
agreement about the number of components of that ability, varying between two and 
ten depending on the authors. Despite the lack of agreement about the definition of 
this concept, we may outline the one belonging to Linn & Petersen [2] as the “skill in 
representing, transforming, generating, and recalling symbolic, non-linguistic infor- 
mation’. 

The structure of the components belonging to the spatial ability has been subject to 
study since the 40s. Commonly, the spatial ability has been considered to be com- 
posed by three components. In Linn & Petersen [2] as well as Lohman’s [3] works, 
the three components are: spatial perception, orientation or spatial rotation and finally, 
the spatial visualization. Spatial perception measures a person's ability to sense hori- 
zontally or vertically, spatial rotation indicates the ability to quickly rotate any two 
dimensions figures as well as three dimensions objects through imagination and spa- 
tial visualization assesses the ability to manipulate the spatial information belonging 
to simple objects through complex operations. 

Other researchers such as McGee [4] y Maier [5] propose five main components: 
spatial relations, spatial perception, spatial visualization, mental rotation and spatial 
orientation. In this classification, we may observe the difference while considering 
mental rotation and spatial orientation. 

We found a couple of issues during the bibliographic revision of the spatial abili- 
ties’ concept. In first place, the studies don’t provide similar results; meanwhile many 
studies identify the spatial orientation inside their rankings [4-6], others don’t [7] and 
even among those which didn’t include them, there is no common agreement accord- 
ing to its definition. Besides, these studies don’t pay attention to the dynamic and 
environmental components, which are considered as quite important factors among 
the spatial abilities field [8]. 

Another factor which should be considered is the environmental scale where spa- 
tial abilities should be tested [9,10]. Montbello [11] proposes that due to the fact that 
the human motor and perceptual system interacts in a different way with space de- 
pending on scale, there are many psychological systems involved in processing that 
information in different scales. In a large scale, we don’t have any chance to obtain all 
spatial information referring to any natural and artificial elements which belong to the 
individual’s personal environment. This kind of abilities is very important in move- 
ments and navigation. In this sense, we find another taxonomy which regards both 
dynamic and spatial components proposed by Gary L. Allen[12], sorting the spatial 
components across three functional families. The first one should answer the “What is 
this?’ question gathering anything referring to the identification and manipulation of 
small still objects such as what happens when a written paper test is being solved. The 
second question is “Where is it?’ including situations where the individual and/or the 
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object may be moving or motionless such as when a ball’s trajectory is being calculated. 
The third question should answer to ‘Where am I?’ referring to an individual moving 
across a big scale environment full of still objects such as buildings or vegetation. 


2 Aim 


Our aim is performing trainings for improving the freshmen’s spatial orientation 
component of spatial abilities enrolled in the engineering graphics subjects on engi- 
neering degrees taught at the Civil and Industrial engineering school from Las Palmas 
de Gran Canaria University. If this training is successful, it will help students easing 
any studying issues while obtaining better academic results on this subject. 


3 Hypothesis 


Until now, just a few studies have attempted to relate spatial abilities at different 
scales, ie, whether the results obtained by measuring spatial skills with psychometric 
test paper, can predict success on large scale task [13, 14]. Regardless of any correla- 
tion, in this work we may assume the spatial orientation as a component of spatial 
ability so we will try to improve it through specific training designed for this expe- 
rience. Therefore, the spatial orientation will improve which will help the student 
towards a better understanding of the Graphic Design subject on engineering. 

The training was chosen having in mind that we may focus on tests performed over 
large environments, so we opted for an orienteering race. Besides, we wanted to eva- 
luate the results performing those tests over two kinds of environments: a real one and 
a virtual one. Our hypothesis is that performance of that training may improve the 
spatial orientation of the students. 


4 Participants 


The participants were 79 freshmen students from the Las Palmas de Gran Canaria 
University belonging to the Industrial Engineering degree. The average age and stan- 
dard deviation (SD) was 18.8 (1.3) between 18 and 24 years old for men meanwhile 
value was 18.8 (1.2) between 18 and 24 years old for women. They were split in two 
homogeneous groups for performing training as 30 of them undertook training in a 
real environment meanwhile 33 of them did it on a virtual one. 

The experiences were performed in the first week of the first semester during the 
2012-2013 academic course, so no student had attended classes of any kind from any 
Graphic Design subject on engineering before undertaking training. None of them had 
ever taken part on any orienteering races either. The orientation values were measured 
using a reliable measurement tool before performing the experience and after its com- 
pletion: the Perspective Taking/ Spatial Orientation Test developed at the Santa Bar- 
bara University by Mary Hegarty et al. [15,16] 
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5 Experience 1: Real World Orienteering 


Thirty participants took part in this experience (10 women and 20 men). The mean 
value and standard deviation (SD) was 18.7 (1.1) between 18 and 24 years. None of 
them declared having any previous experience in orienteering sports. For encouraging 
participation, the three top performers with best times will enjoy an upgrade on their 
marks as long as it’s 4.5 or higher. 

The experiment consisted of two phases, first of them 45 minutes long and in the 
classroom, an expert at orienteering explained the basis for this practice with empha- 
sis on the use of the compass. There were some relevant changes respecting the usual 
orienteering race. During the race there weren’t any geographic elements as well as 
any building or vegetation which could be used as a reference. Only distances or rela- 
tive angles were available so they were shown on spot how to measure distances 
through steps on the plane’s scale. Besides, as this wasn’t any physical test, they were 
instructed not to run as celerity relied on the ability to orientate and not on swift 
movement. 


Fig. 1. Left. Expert giving instructions. Right. Students measuring distances through steps. 


While using maps, the orienteering method will depend on the ability to interpret 
them — spatial relations will be established over symbols- and the ability to connect 
the map with the field and vice versa. 

We also considered some common actions from this sport: 


Bringing the map and other race material (compass, card and control’s description). 
Map orientation 

Map reading 

Choosing the right path. 

Deciding the most suitable technique 


In our experience we omitted some actions for strengthening the desired meaning 
for the race. The experience took place in a football field where there were only archi- 
tectonic elements available for setting spatial relations. But given the simplicity of a 
football field and how the orientation sense was meant to be trained, we omitted the 
architectonic elements by providing the students with a ‘blind map’ where the only 
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display belonged to the relative layout of the beacons according to their angle and 
distance as well as the geographic north. 


Fig. 2. Left. Football field’s map including beacons. Right ‘blind map” provided to the participant. 


For the second phase, the proper ‘orientation race’ indeed, there were two tracks 
designed with a similar length. The students were told that both paths were different, 
to avoid them following each other. So, they didn’t know which path was being fol- 
lowed by the student who previously got into the field. Each student completed both 
paths, so the results obtained doesn’t rely on pure luck. 

For performing this test we used the Sportident system. This system is based on the 
SPORTident-Card, similar to a pen drive. During the race, the system compiles times 
and numeric codes from the control points. The output offers individual registry val- 
ues for both intermediate and total times. 


A Course (I)p 11 KP 


2. CRISTIAN DOMINGUEZ DEN... 00:29 
2 


az PEREZ 


Fig. 3. Totals and records partial results 


6 Experience 2: Virtual Orienteering 


In this experience there were 33 students taking part in it, including 12 women and 21 
men. The average age and standard deviation were 18.8 (1.4) between 18 and 24 years 
old. Like in previous case, no participant had any experience in orienteering sports 
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and involvement was also encouraged with a marks’ upgrade in the same conditions 
of the previous experience. 

The first phase of this experience took around 90 minutes and consisted once again 
on a classroom explanation of the orienteering sport’s basics as well as the compass’ 
use and further guidance about use of the given software. 

The demo version of the Catching Features program was used and despite being 
freeware, it included every single feature we needed: 

http://www.catchingfeatures.com/ 

Catching Features is an orientation game where one or several players get im- 
mersed in a virtual environment. Players take part in several races using a topographic 
map with their key and compass as a real race. 


Fig. 4. Catching features settings 


The game also offers the chance to play on a single player mode or multiplayer 
against other participants. In the full version there are also online games available. 

The settings are quite realistic as the player can move and have points of view 
across every single direction. Every beacon can be found using the key and compass 
until the itinerary is complete. 

In our experience we ask the students to complete a minimum of six races. They 
had one week to finish them and they performed them at their own homes through 
their own computers downloading the game’s free trial version. 

Catching Features provides several start formats with results obtained in different 
races. So, a file with the full results was requested to the students for evaluating the 
experience’s performance and applying the incentives. 


Fig. 5. Student results 
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7 Measures and Results 
In table 1 we find the results obtained by the students in the Perspective Taking Test 
before and after training for the three groups: Real Orienteering, Virtual Orienteering 


and Control Group. Mean values prior to training are quite similar in all three groups. 


Table 1. Values Pre/Post Test and Gain Scores 


Pre-Test Post-Test Gain 
Real Orienteering Group = 42.33 18.75 23.58 
Virtual Orienteering Group 54.77 24.00 30.77 
Control Group 54.29 45.65 8.64 


An analysis of variance (ANOVA) was carried out with all data obtained from the 
three groups in the Perspective Taking and Spatial Orientation Test, showing there is 
no Statistical difference between groups prior to this training. So the three groups 
were Statistically equivalent about spatial orientation at the beginning of this study. 

We compared the mean values obtained in the pre and post test using the t-Student 
paired series test and data for the real orienteering group were t=8.08, p-value=0.00; 
for the virtual orienteering group t=11.90, p-value=0.00 and finally the control group 
obtained a p-value=0.27. 

The groups performing these trainings showed a statistical improvement in their 
spatial orientation levels. The p-values are below 5% statistical significance which 
means that any student who performs one of both trainings has a chance over 95% of 
improving their spatial orientation levels. Besides, the results show there is no im- 
provement in the spatial orientation levels for the control group. 

For comparing and checking out if there is any difference between both groups, we 
carried out the Sefflé contrast over multiples choices. 


Table 2. Groups comparison 


(1) (J) Difference be- Typical Sig. Confidence interval 
group — group tween error at 95% 
mean values (I-J) Upper Lower 
limit limit 
1 2 -7.18848 3.49433 128 -15.9131 1.5361 
3 18.52125(*) 4.28814 .000 7.8146 29.2279 
2 1 7.18848 3.49433 128 -1.5361 = 15.9131 
3 25.70973(*) 4.21981 000 =15.1738 = 36.2457 
3 1 -18.52125(*) 4.28815 000 = -29.2279  -7.8146 
2 -25.70973(*) 4.21981 000  -36.2457 — -15.1738 


*Difference between mean values is significant at .05 level. 
1(Real Orienteering),2(Virtual Orienteering), 3(Control Group) 
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The results show there is no significant difference between the control group and 
those groups performing the training. Besides, there is no difference between the 
groups undertaking training in both the real and virtual environments. This means that 
improvement in spatial orientation is similar performing any of both trainings. 


8 Conclusions 


Even obtaining the results described in previous section, where we saw that students 
improved their spatial orientation in any of both trainings, we must consider certain 
questions while we approach the new experience in subsequent academic courses. 
Despite how precise and realistic the environment’s simulation is, we must consider 
the fact that people is used to move through a real environment. Despite the visual 
information available, the field of view in the real world is much wider than in the 
virtual one, so the spatial information output is much lower in the virtual environment 
and the update of our body’s location may come not only from the visual system but 
the kinesic one as well. Besides, while in a real environment we need an auto-directed 
movement from our body, in the virtual one the experience is far more passive. 

From the teacher’s point of view, it was easier to prepare and perform the virtual 
experience. Aside from the fact of learning the orienteering race basics and handling 
the Catching Features app, there was no other relevant issue. However, the prepara- 
tion of the experience in a real environment held at a football field required not only a 
great organizational capabilities but also putting together services and staff from the 
Las Palmas de Gran Canaria University. 

From the student’s point of view, we received positive feedback; both tests were 
attractive for them and they performed them enthusiastically and willingly although as 
we previously mentioned, they were encouraged with a mark’s upgrade if they 
reached the top three. We were surprised by the positive welcome and assessment that 
the real field test, the orienteering race, had. The interaction component and interper- 
sonal competition got students so involved that they wanted to perform similar tests 
and showed great interest about the spatial abilities in their curriculums. 
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system available for acquiring all data from the test. His vast experience in this sport 
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Abstract. In this paper we present an approach that makes possible the staging 
of choreographies for education and training purposes in potentially any virtual 
world platform. A choreography is seen here as the description of a set of ac- 
tions that must or may be executed by a group of participants, including the 
goals to be achieved and any restrictions that may exist. We present a system- 
architecture and the formalization of a set of processes that are able to transform 
a choreography from a platform-independent representation into a specific vir- 
tual world platform’s representation. We adopt an ontology-based approach 
with distinct levels of abstraction for capturing and representing multi-actors 
and multi-domain choreographies to be staged in virtual world platforms with 
distinct characteristics. Ontologies are characterized according to two comple- 
mentary dimensions — choreography’s domain (independent and dependent) and 
virtual world platform (independent and dependent) — giving rise to four ontol- 
ogies. Ontology mappings between these ontologies enable the automatic gen- 
eration of a choreography for virtually any target virtual world platform, thus 
reducing the time and effort of the choreography development. 


Keywords: virtual worlds, training, choreography, multi-user, model-driven, 
ontology, mapping. 


1 Introduction 


Virtual worlds have achieved significant levels of interest for supporting teaching and 
learning activities [1], [2] since they provide the creation of immersive environments 
where multiple elements of a team sharing a common virtual space can develop 
competencies in a simulated context [3], [4]. Choreographies of virtual actors are a 
specific type of content that represent the set of actions that can be performed simul- 
taneously by human-users and virtual computer-controlled actors thus enabling 
human trainees/students to play roles as part of teams or within a simulated social 
context. In this sense, a choreography is the description of a set of actions that must or 
may be executed by a group of participants, including the goals to be achieved and 
any restrictions that may exist. 
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Because designing a choreography is a resource-intensive effort, it would be de- 
sirable for the result not to be hostage to a specific virtual world platform (VWP) but 
rather deployable in any VWP. However, as virtual platforms are very heterogeneous 
in terms of (e.g.) functionalities, data models, execution engines and program- 
ing/scripting languages or APIs, deploying a platform-based choreography into 
another VWP is difficult and time-consuming [5]-[8]. 

We believe that the approach presented in this paper provides a contribution that 
facilitates the development, sharing and adaptation of choreographies aimed to be 
staged in different virtual platforms. For this, we suggest an approach where the con- 
ceptual representation model of the choreography is captured in the form of ontolo- 
gies, and its adaptation to a particular virtual world follows a set of models transfor- 
mation processes, similar to that suggested by the Model Driven Architecture (MDA) 
paradigm [9]. The proposed ontology-based definition of choreographies can capture 
not only the physical aspects as objects but more complex content such as procedures, 
consisting of sets of actions and conditions in which the actors can perform them. 

Thus, this paper presents an approach that deals with the design and representation 
of platform-independent multi-user choreographies, and their deployment to different 
VWPs with minimal effort and time using a set of transformation processes based on 
ontologies and alignments. The rest of the paper comprehends four more sections. In 
section 2 we present the proposed approach and the description of the system archi- 
tecture. Section 3 describes the developed experiments. Section 4 compares the re- 
lated work with the proposed ideas. Finally, Section 5 summarizes the proposal and 
suggests future directions. 


2 Proposed Approach 


To deal with VWP with different characteristics, we argue that choreographies should 
be clearly separated from the technical characteristics of the execution in the VWP. 

To this end, the core of the proposal is a “generic high-level ontology” that cap- 
tures the choreography in a conceptual and abstract fashion, so it is independent from 
the staging/deployment VWP. Thus, the data model of every virtual world must be 
captured/represented by the so-called “platform-specific ontology’, and a mapping 
between the generic high-level ontology and the platform-specific ontology must be 
defined. The mapping will provide the means to transform the platform-independent 
choreography into a platform-dependent choreography. 

To address this the MDA software-development paradigm [9] is adopted and 
adapted. MDA specifies three default models of a system corresponding to different 
layers of abstraction: a Computation Independent Model (CIM) the most abstract, 
which represents the view of the system without any computational complexities; a 
Platform Independent Model (PIM) that describes the behavior and structure of the 
system, but without technological details, and a Platform Specific Model (PSM) that 
combines the specifications in PIM with the specific details of a specific platform. 

Based on the concept of model independence and model transformation of 
MDA, we adopt an approach based on two first-class citizen dimensions: the VWP 
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dimension and the choreography’s domain dimension. In fact, unlike in MDA, in our 
approach the model is not only independent from the VWP but also independent from 
the (choreography’s) domain. Fig. 1 depicts the MDA one-dimensional approach 
(Fig.1 a) in comparison with the two-dimensional envisaged approach (Fig. | b). 

The nomenclature Op,p, refers to Ontology, Platform and Domain, with “x” 
assuming “d” and “i” values for “dependent” and “independent”, respectively. E.g. 
Opipg Stands for “Ontology, Platform-independent, Domain-dependent”’. 

Taking into account the characteristics of ontologies, and considering they are the 
best way to represent conceptual information in order to bring the intelligent systems 
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Fig. 2. The system architecture with the processes of authoring, mappings, transformation and 
execution 
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closer to the human conceptual level [10], we claim that ontologies are then the ade- 
quate knowledge representation model for bridging the gap between the human re- 
quirements and the computational requirements [10]-[13], thus able to play the role of 
both Computation-Independent Model (CIM) and Platform-Independent Model 
(PIM). Following MDA, the ontology-based choreography representations are succes- 
sively transformed through a set of processes until the final, platform-specific choreo- 
graphy (Opgpq) that is executed in the specific VWP. The proposed architecture is 
depicted in Fig. 2, and comprehends four ontologies and five processes. 


2.1 Ontologies 


The following four ontologies are representational models of the different choreogra- 
phy abstractions: 


© Opip; (platform-independent and domain-independent) is the generic high-level 
ontology representing the core concepts of a choreography independent of any im- 
plementation environment, also designated as the foundational ontology. Fig. 3 
presents its current status, whose motivations and design decisions have been de- 
scribed in a previous paper [14]. 

© Opap; (platform-dependent and domain-independent) represents the core concepts of 
a choreography for a specific VWP. Despite this is a platform-dependent ontology, it 
remains independent from any application domain, and therefore is developed only 
once (eventually requiring adaptations in face of mutations in the VWP due to any 
evolution). Each virtual world can have their own interpretation of a choreography 
describing the concepts in a private way in order to best fit its characteristics, but this 
ontology must capture and represent every concept corresponding to those defined in 
the foundational ontology to capture the same semantic knowledge, so that is possi- 
ble to establish semantic relations between them. Thus, we consider that in order to 
apply this approach, it is necessary to develop, for each target VWP, an ontology to 
represent that virtual world platform’s particular interpretation of the fundamental 
ontology. Moreover, this ontology can incorporate additional concepts considering 
the specific characteristics of that virtual world and using its own terminology. 
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Fig. 3. The Foundational Choreography Ontology (Opjp;): representation of concepts, proper- 


ties and relations between concepts 
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© Opina (platform independent and domain dependent) is a choreography resulting 
from the authorship of Op;p;. It captures the representation of a complete choreo- 
graphy for a specific application domain, without any concern about the technical 
specificities of any platform. 

© Opapa (platform dependent and domain dependent) is a choreography represented 
in/for a specific VWP. This is the choreography that is intended to serve as a refer- 
ence for the staging in the virtual world. 


2.2 Processes 


In the proposed architecture we apply five processes to conduct a successive trans- 
formation of models representing the various abstractions of a choreography to adapt 
its specification to a particular virtual world. 

To illustrate the explanation, we will consider a generic Instant Messaging plat- 
form for which a choreography will be adapted. This is a very simple platform (de- 
veloped by the authors for testing purposes, cf. Fig. 7) where there is only the repre- 
sentation of an action called write. 


Authoring. Authoring is a user-based process in which the choreographer authors a 
domain dependent choreography in the form of an ontology. This process is typically 
performed by an expert that manually builds the ontology. But end-user tools can also 
be developed to allow people without knowledge or training in ontologies to specify 
the choreography through simple and intuitive interfaces. That is, end-user tools can 
in a transparent manner, build ontologies instead of human users directly, facilitating 
the authoring process. 

The foundation ontology (Opjp;) is extended and refined semantically to describe 
the choreography entities specific to an application domain, giving rise to Opjpg .The 
authoring process must ensure the set of changes applied does not change the seman- 
tics defined in Op;p;. For that, the following assumptions must be guaranteed: 


e No axioms defined in Op;p; can be removed; 

e No contradictions can be added, i.e., Opipg must be logically consistent; 

e No new root elements are allowed. Ie. new entities (classes and properties) are 
defined as sub entities of those existing in Op;p; and should not create new root 
elements. 


Fig. 4 depicts an excerpt of the ontology resulting from authorship (Opjpq), and more 
formally its representation using Description Logics (DL) syntax. 

New concepts are added as well as restrictions that will define boundaries to the 
ability of actions execution by the actors. The restrictions are based on the definition 
of associations between roles (that actors can play) and actions. Commonly, the fol- 
lowing two types of restrictions are defined: 


1. To constrain the actors allowed to perform an action based on the user’s role, i.e. to 
perform an action, the actor must have a specific role previously defined. The 
relation between the concepts referenced by the action and role is defined by the 
property performedBy; 
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2. To assign the role(s) of an actor based on the action(s) s/he performs. Thus, the 
roles are dynamically assigned during the choreography according to the actions 
performed by the actor. This association can be seen in a dual perspective. Using a 
small example: on the one hand, if an actor plays the role Role! and performs ac- 
tions Action! and Action2, s/he shall automatically plays the role Role2 thereafter. 
On the other hand, to play the role Role2 it is a necessary condition that the actor 
plays the role Role! and performs the actions Action! and Action2. 
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@ Action Action © Thing 1 3 performedBy.Role 
tll performeaBy ManipulateAction © Action M3 performedBy.Mechanic 
4. —@ ManipulateAction GrabScrewDriver © ManipulateAction 3 performedBy.M1 
i _ 4 performeaBy Role & Thing 
t | 4% A GrabScrewDriver Mechanic © Role 
\y @B performedBy ~ M1& Mechanic 
‘*@ Role ‘ 
“s@ Mechanic , “ 
@ i «------"" 


@concept mM! Property 


Fig. 4. An excerpt of the (Op;pq) Ontology resulting from the authoring process 


During the authoring process, the author can take advantage of all the semantic ex- 
pressiveness of the DL language to elaborate more complex choreographies. For ex- 
ample, if an actor during the choreography cannot play two roles, one can specify 
using DL that these two roles are disjoint. 


Mapping1. Mapping! is the process that establishes correspondences between the 
representations of the two choreographies abstractions represented by Op,p, and Op jp, 
(Alignment1). 

An alignment is a set of correspondences (semantic bridges in this paper). A se- 
mantic bridge is a set of information elements describing what entities from both 
source and target ontology are semantically related, and what conditions must hold in 
order to be considered [15]. Two types of semantic bridges are considered: 


1. Concept Bridge (CB), used to describe the semantic relations between (source and 
target) concepts; 

2. Property Bridge (PB), used to specify the semantic relations between (source and 
target) properties, either relations or attributes. 


In this process all the core concepts (concepts that are direct subclasses of the Thing 
concept, depicted in Fig. 3) of the foundational ontology (Op;p;) should be mapped to 
concepts of the platform specific ontology (Opgp;) to ensure that there is full corres- 
pondences between both ontologies. Correspondences between properties are defined 
between properties of two mapped concepts. In order to facilitate understanding, 
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a relation is established between the concept-concept correspondence and the proper- 
ty-property correspondence (Fig. 5). 


Mapping2. Mapping? is the process that establishes correspondences between the 
domain choreography (Opjpqg) and a VWP ontology (Opgp;), i.e. Alignment2. Align- 
ment2 profits from (and extends) Alignmentl, thus promoting reuse and reducing 
efforts (Fig. 5). 
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Fig. 5. A partial view of the alignments resulting from Mapping! and Mapping2 


Transformation. Transformation is the process that creates the VWP choreography 
(Opapa) from Opjpg and Opgp; and according to the Alignment2. 

This is a fully automatic process that “copies” the Opjpq classes and constraints 
(Fig. 4) to the Opapg (Fig. 6). 

Despite this being an automatic process, choreographers can intervene and edit the 
resulting Opgpgq ontology for additional adjustments. Thus, the Opgpg ontology may 
be further finely tuned to better fit the implementation platform. 
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Fig. 6. An extract of Opgpg ontology 
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Execution. Execution is the process that stages the choreography in a VWP through a 
Player (computer program) compatible with the VWP who has the ability to schedule 
actions according to the choreography and control its execution by virtual characters. 
Further, it monitors the human-user performance by comparing the executed actions 
with those described by the choreography, and reacts accordingly. This process uses a 
reasoner mechanism to evaluate whether it is possible to perform the actions, verify- 
ing if all necessary conditions are met. When virtual-users are present, a planner is 
used to calculate a plan of actions for them. 


3 Experiments 


For the evaluation of our approach we deployed several real-world choreogra- 
phies that were staged in two different multiuser platforms with very distinct cha- 
racteristics with human-users only. We used the VWP OpenSimulator' (OpenSim) 
to create a realistic multiuser 3D environment; as a counterpart system, we 
developed for testing purposes the aforementioned messaging platform. It is a 
simplified version of text-based virtual worlds of the Multi-User Dungeons era, 
following Morgado’s definition [16]. This messaging platform has very different 
characteristics from the OpenSim, since it does not allow the representation of 
scene objects, but enables the development of a team’s choreography nonetheless. 
Its interface provides the actions of the choreography in the form of buttons, the 
interaction is done by pressing buttons, and when an action is performed success- 
fully by each team member, it is communicated to all other team members by 
means of a text log (Fig. 7). 

Authoring is obviously the most time-consuming and creative process, while 
semi-automatic Mapping! and Mapping2 processes require reduced time and effort. 
Once these processes are done, the transformation and execution processes are fully 
automatic. 
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Fig. 7. Staging the choreography in a) OpenSim and b) messaging platform 
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4 Related Work 


There is prior relevant related work addressing the description of plans to represent 
training procedures to be staged by a single actor as well as by teams with several 
elements and how actions are distributed among them. But, most approaches design a 
choreography aiming it to be staged on a particular VWP. This creates strong depen- 
dencies with this VWP, making it difficult or even impossible to apply to other virtual 
worlds. Thus, related work can be categorized according to three dimensions: model- 
ing independence, VWP independence and number and type of the actors. 

Some approaches use separate models to represent the specification of procedures 
and scene [7], [17], [18]. They address team training scenarios but they are strongly 
dependent on the characteristics of the VWP. Some other approaches attempt to 
bridge the gap between the representation of procedures and its execution in distinct 
VWP. However, such approaches are only focused on a single user not allowing the 
representation of teamwork [19]-[21]. 

Instead, our approach is capable of representing teamwork choreographies involv- 
ing multi-users played either by human and virtual-characters. Also, the actions and 
scene are captured conceptually using a unique choreography model that is converted 
to potentially any VWP. 


5 Conclusions and Future Work 


In this paper we propose an approach that allows the development of choreographies 
and its adaptation and staging in potentially any VWP. For that, based on the concept 
of MDA and the assumption that the use of ontologies is the best way to represent the 
conceptual information to approximate the intelligent systems to the human concep- 
tual level, we propose an ontology to capture the semantics of a generic choreography 
independent of any application domain and VWP. Further, for each VWP is adopted 
an ontology representing its specific terminology and functionalities, and is mapped 
with the generic one. 

Using a set of alignments between the ontologies we describe a complete sequence 
of processes that allow adapting a choreography of a specific domain (but indepen- 
dent of any VWP) into a choreography suitable and capable of being staged into a 
specific virtual world. We also describe the execution process that monitors, manages 
the staging of the choreography, and uses reasoning engines to aid in the evaluation 
and validation of actions. 

Moreover, ontologies allow the integration in the same model all the modeling in- 
formation related to the choreography, i.e. the definition of procedures related to 
teamwork and the information about the scene. 

Using alignments between ontologies enables the automation of adaptation of the 
generic ontology to the specific target ontology, hence contributing to reduce devel- 
opment time and resources. 

In future work the Mapping! and Mapping? processes can be refined to incorpo- 
rate automatic matching mechanisms. So, it would be possible to increase the abili- 
ty to automate these processes while at the same time it reduces the need for user 
intervention. 
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Abstract. We propose the “Make Your Own Planet” workshop, which com- 
bines handicraft and digital representation tools (3DCG effects). In this work- 
shop, a child uses a USB camera to select textures freely in the process of mak- 
ing an original 3DCG planet. All 3DCG planets are then placed in a simulated 
universe for public viewing. By watching this universe, viewers can appreciate 
the planet of each child. Further, the texture of each 3DCG planet is translated 
to a polyhedron template and printed out as a paper-craft template. In this 
process, children employ computers to transform their planets into physical ob- 
jects that they can bring home. We first describe the workshop concept and then 
the method by which it was implemented. Finally, we evaluate the workshop. 


Keywords: Digital workshop, 3DCG, Unity, I/O device. 


1 Introduction 


Workshops are currently viewed as opportunities for experimental learning. As such, 
various workshops are held every weekend at educational facilities, such as museums 
and universities. In Japan, workshops have attracted attention as places of learning. 

CANVAS [1] is unique in that it promotes activities that link technology to the ex- 
pression of children. A non-profit organization (NPO) holds a “Workshop Collection” 
every March at Keio University’s Hiyoshi Campus. In Japan, CANVAS develops and 
hosts workshops for children at various educational facilities. This expo, now in its 
ninth year, has grown into a big event, attracting about 100,000 parents and children 
over two days. Not all the workshops in the Workshop Collection use digital technol- 
ogy, but the number of those that do is increasing. 

Most of the systems used in these workshops, require operations, such those 
provided by keyboards and digital mice. For the reasons described above, older ele- 
mentary school children are targeted in these workshops. 


2 The Concept of Digital Workshop 


2.1. The Trend of the Digital Workshop 


Typical examples of workshops that use technology are those for creating handmade 
crafts through computer-aided activities. Many universities research and develop 
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systems that support handmade work, such as paper crafts [2], stencil designs [3], and 
pop-up cards [4]. They then hold workshops to disseminate the results of their re- 
search in society. An important purpose of these workshops is to have participants 
and children experience the “joy of creation” by making their own works. 

In these creative workshops, computers support creative activities by providing 
specialized knowledge, augmenting skills, and reducing and simplifying tasks. In 
other words, computers serve as specialists or professionals. Here, the relationship 
between children and computers is vertically structured. However, we attempt to pro- 
vide structures and devices that enable the active involvement of children in creative 
activities by using computers; thus, they can experience the “joy of creation.” 

In this paper, we report the on “Make Your Own Planet” workshop, which com- 
bines handicraft and digital representation tools (3DCG effects). In this workshop, a 
child uses a USB camera to select textures freely in the process of making an original 
3DCG planet. All 3DCG planets are then placed in a simulated universe for public 
viewing. By watching this universe, viewers can appreciate the planet of each child. 
Further, the texture of each 3DCG planet is translated to a polyhedron template and 
printed out as a paper-craft template. In this process, children employ computers to 
transform their planets into physical objects that they can bring home. 


2.2. Rerated Work 


Workshops that use computers as tools for handmade activities are quite common. 
Broadly speaking they, can be divided into “programming learning systems,” “sup- 
port systems,” “expression tool systems.” The planet maker proposed in this paper is 
an expression tool system. 

A programing learning system is the most general example of a workshop that em- 
ploys computers [5]. Workshop programs that design robots and determine their 
movements are being implemented all over the world. The purpose of these work- 
shops is to understand the features of sensor devices and programming languages. An 
understanding of algorithms and complex operations are necessary; thus, they are not 
appropriate for younger children. 

A computer that provides knowledge and offers support system can reduce the dif- 
ficulty of shaping activities. Therefore, it is possible to produce complex handwork, 
even with children and beginners. In recent years, support systems for paper crafts 
[6], pop-up-cards [7], have been developed. 

An expression tool system provides to user with expressive activities on a comput- 
er. These systems can be seen especially in media art. With “Minimal Drawing” [8], 
one can draw pictures on a simulated canvas, which is rotated. “Body paint” [9] al- 
lows users to draw on walls and experience their own bodies as brushes. 

“I /O Brush” [10] is a system relevant to our system. It presents a heightened 
effect to children by ink drawing that takes pictures as real world objects, thus en- 
couraging youthful creativity. The difference between it and our system is that the 
latter permits children to create a piece of a three-dimensional entity. Children are 
able to watch the work of other children at the public viewing. In our system, we liken 
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a three-dimensional planet to drawing. A child’s planet is on public view as a 3DCG 
animation that simulates the universe. Moreover, the system can print on the spot. 


3 System Development 


The planet maker consists of three modules: the “Paint Module,” the “Space Display 
Module” and the “Mapping module.” In this section, we describe each method to 
develop a module and each module’s functions. Figure | shows an overview of the 
system. 


Texture server 
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QD router 


Paint Module 


Projecter 


mapping 
module 


ol UNITY3D ———— Sphere type 
Interface 
7 USB Camera 


Printer 
ee] Shooting Drawing 
Temprate of icosahedoron Real Object 


User 


Fig. 1. Overview of the system 


3.1 Paint Module 


Children can paint the own planets with the Paint Module. This module can paint a 
3DCG spherical object using an image captured with a USB camera as ink. Figure 2 
shows the principle drawing method. 


Spherical Interface 

We developed an original tangible interface so that children could paint texture easily 
on a sphere. This interface consisted of Arduino and Potentiometer. There are a num- 
ber of buttons on the sphere interface. One performs screen transitions, another is a 
shutter release button on the USB camera, and a third adjusts volume to alter texture. 
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These mechanisms are controlled by Arduino, an I / O device. Figure 2 shows the 
Paint Module GUI and the Spherical interface. 

At first, a user takes a picture of an object to use as ink. Next, it is painted with the 
texture 3DCG display, by specifying the location of any of the spherical interfaces. 
Paint locations on the sphere are specified by touching the guide above along the 
longitude. A sphere type interface allows rotation with central axis; thus the user can 
draw as with a brush. 


: ; Pen size 
Guide y — Volume 


(c)Spherical Interface 


Coordinate of Coordinate of 
Display (x, y) Texture (u,v) 


Processing image for texture drawing 


Fig. 2. Paint Module and Spherical Type Interface 


3.2 Space Display Module 


Space display module is a public viewing module that display all planets designed by 
the children. The texture of the planet made by the paint module is registered to a 
texture database. The space display module displays each planet with the texture data 
newly added to the database. By watching this space, children can appreciate the pla- 
net of each child. 


3.3. Mapping Module 


The texture of each 3DCG planet is translated into a polyhedron template with a map- 
ping module and printed out as a paper-craft template. In this process, children are able 
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to transform the planets that they made with computers into physical objects that they 
can bring home. In other words, children can make digital as well as physical works. 


3.4 Planet Sheet 


In a workshop, the name of the producer is described on a sheet, and a portion of a 
named planet is acquired as a picture with a USB camera. 

This picture appears as a label on the preview screen of the planet in a space dis- 
play module. Figure 3 shows the Paint Module GUI and Sphere Interface. Figure 3 
shows the flow of making paper craft template. 


SS 
SX 


(a) Planet sheet 


(d) Paper Craft Making 


(c) Template of Paper Craft 


(b) Texture of Planet 


Fig. 3. Flow of Making a Paper Craft Template 


4 Make Your Own Planet Workshop 


In order to evaluate our design, we conducted a workshop at the Workshop Correction 
9 in Hiyoshi Yokohama. The target age range was six years or older, and the time 
allowed to experience the workshop was about 60 minutes for each participant. 

The workshop was conducted by preparing four client terminals. A total of five in- 
structors were assigned as facilitators to move the workshop forward. After the work- 
shop, a survey was carried out in order to obtain evaluations of the workshop. 


5 Discussion 


We conducted a survey with five kinds of questionnaires evaluation of the interface, 
degree of work sharing, satisfaction with one’s work, and motivation for future work. 
Another questionnaire contains questions about the attractive elements of “Make your 
own Planet.” 


“Make Your Own Planet”: Workshop for Digital Expression and Physical Creation 121 


We obtained the results of the survey of 261 children at the workshop. Table 1 
shows the questionnaire and items. Figure 4 shows the result of the questionnaires 1-4. 


Table 1. Qestionnaire and Items 


Making the planet a camera and sphere-shaped controller What was easy? 
Did you make any refer to the planet of friends when you make a planet? 


Did you are satisfied with the planet which I made myself? 


If you have a chance, do you want to make a planet in this workshop? 


Neryeasy, [Easy [Becoming easy Diftout———_-[_——— 
referred _[ORen referred [Not to refer at a= | 
Nery much if there time Atte _|notata—__‘[_—— 


Please choice the order that you thought it was fun in this Wakushop. 
A.Drawing the pattern of the planet sphere-shaped controller. 
B.You can choice of design your own image. 


C.The planet of your come out to the universe of public viewing. 
D.That it is possible to see the planet of many friends 
E.You are able to make papercraft of their own planet. 
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Fig. 4. Result of Q1-Q4 questionnaires 


Concerning the usability of the system, 44.4% stated that it was “very easy” or 
“easy” to use. If “becoming easy with use” is included, the positive opinion was 
88.2%. For “work share,” “most children stated that they did not “refer at all” to the 
creations of others. Nearly 70% of the children created an original planet, without 
referencing those of other participants. An overwhelming 93.2% answered that they 
“very satisfied” and “satisfied” with their planets. As far as motivation for future 
work, 75.9% of the children wished to repeat the experience. Since there were no 
negative opinions, it is clear that the satisfaction with the workshop and the motiva- 
tion were high. This workshop thus appealed to the children. 
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Fig. 5. Answers of Q5 questionnaire 


Figure 5 shows the results of the other questionnaire, in which the children 
ranked activities. The found the ability to choose a pattern freely the most interest- 
ing aspect. They had a great interest in employing the planet’s spherical controller. 
Creating a paper planet received third place. From these results, it is clear that mak- 
ing physical objects as paper crafts increased the children’s motivation for creative 
activity. 

The proportion of children who created by referring to the works of others was 
about 30%. The question on the appreciation of the work of a friend had a score lower 
than those of other items. The children had, however, a positive feeling that their 
works were shown in a public place. 

The space module was received positively, but it did not efficiently function as a 
tool to stimulate the ideas of children when comparing their works. However, al- 
though children liked their own creations, they also expressed a strong desire to view 
the planets of their peers. The operation of the spherical interface required some prac- 
tice, but we succeeded in providing an environment in which children employed com- 
puters. Thus, we designed and put into operation a fully functioning digital workshop 
that offered a special creative activity. 


6 Future work 


For the future, the following two points should be kept in mind. The first is the neces- 
sity of improving the spherical interface. The children found it hard to paint part of 
the pole area of the sphere. The pole area is narrower, since a mounting surface joint 
is part of the interface base and the sphere. This feature made it difficult to draw. 
Therefore, we will improve spherical interface, as shown in Figure 6. The second 
point is the need to improve the space display module used in public viewing. We 
found that children found it difficult the view the works. To correct this fault, we are 
currently developing a system that allows a planet to be viewed at the WEB. 
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Fig. 6. Image of Improved Spherical Interface 
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Abstract. This paper reports on a user-centered formative usability evaluation 
of diverse visualization technologies used in Virtual Museums. It initially 
presents the selection criteria and the five museum websites involved in the 
analysis. Then, it describes the evaluation process, in which a group of subjects 
explored the museums’ on-line resources and answered in two usability ques- 
tions concerning overall reaction to the website and the subjective satisfaction 
of the users. After user testing, quantitative and qualitative data have been col- 
lected and statistically analysed. However, much research remains to be done 
on future research in terms of larger sample, different methodologies and varied 
contexts. 


Keywords: history and culture, digital humanitis, cultural informatics. 


1 Introduction 


The London Charter encourages virtual museums to promote rigorous design of digi- 
tal heritage visualization however; it suggests that virtual museums should ensure that 
embedded visualization paradigms follow a human-centric design so that they pro- 
mote the study and interpretation of cultural heritage assets. The Principle 2 of the 
London Charter states that computer-based visual media should be employed when 
they provide added value for the study of cultural assets compared to other methods. 
It stresses that in order to determine the suitability of each technologically-driven 
visualization method, a systematic evaluation of such methods should be carried out 
based on specific evaluation criteria. Relevant research sources utilized should be 
identified and evaluated in a structured and documented way taking into account 
best practice within communities of practice. The London Charter’s main goal is 
to encourage dissemination of computer-based visualization so that significant rela- 
tionships between cultural elements can be determined by visitors. Such dissemina- 
tion should target to strengthen the study, interpretation and preservation of cultural 
heritage. 
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Various researches have evaluated museum websites using design patterns [1], 
usability of virtual museum websites [2,3], utilizing both empirical and expert-based 
methods combining quantitative and qualitative research methods [4, 5], explored the 
relationship between the sense of Presence, previous user experience and enjoyment 
[6], the effect of various visualisation technologies to the sense of Presence [7], have 
developed guidelines concerning issues ranging from design considerations to project 
philosophies [8], exploring requirements for online art exhibitions [9]. The main goal 
of this paper is to explore the usability parameters that can be used as reference for 
evaluating virtual museums, which often incorporate varied technological elements. 
After a short introduction to virtual museums and the selected cases for the purposes 
of research and the usability evaluation, the participants, the experimental procedure 
and the methods used for the statistical analysis are presented. In the last section of 
the paper, the research results are analysed and discussed. 


2 Virtual Museums 


A virtual museum [10] is a complex environment that according to the choices of the 
design team, determines the visitors' final experience and subsequent attitudes to- 
wards the use of digital media in museums. In order to cluster the wide range of exist- 
ing museum websites into specific representative categories, a team of four scientists 
experienced in interactive design and the use of Information and Communication 
Technologies in culture and education, was assembled. Museum online resources 
were divided according to the presentation method employed for their visualization 
and grouped/ classified according to that in five technologically-oriented categories of 
museum sites mainly including: Panoramic images (QTVR), (2) Scalable images with 
text, (3) Searchable databases, (4) 3D environments, (5) Videos. 

The experts shared a preselected pool of museum websites and worked indepen- 
dently to extract within these categories the factors that may influence the user's expe- 
rience according to their personal understanding and recent research literature on 
evaluation strategies for virtual museums. Subsequently, the factors were merged into 
a set of five qualities or capacities: imageability, interactivity, navigability, virtual 
spatiality and narration as explained in Table 1. Of the five representative cases of 
virtual museums as presented below, four serve as extensions to existing physical 
museums, while one is totally imaginary. 


Imageability: Panoramic Images. Imageability is defined as the “quality in a physical 
object that gives it a high probability of evoking a strong image in any given observer. 
It is shape, colour, or arrangement, which facilitate making of vividly identified, po- 
werfully structured, highly useful mental images of the environment” [11, p. 9]. In VEs 
of high imageability, users can experience the real museum space through panoramic 
images that can be manipulated thanks to a set of interactive tools, such as rotate and 
pan, zoom in and out, and even navigate. The case selected for this study, labeled as 
M1, is the "Virtual Exhibition Tours" (http://www.nga.gov/onlinetours/index.shtm) of 
the National Gallery of Art in Washington. In this online environment, visitors can 
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select specific works of art for larger image views, close-up details, streaming audio 
commentary, and information about the object (Figure 1). 


Table 1. Qualities of museum online resources. 


Definition 


: Perceptual quality of a VE that makes it memorable 


The HCI functionality that makes a VE able to communicate 
| with its visitors 


The degree to which navigation capabilities are perceived from 
' structural elements of the VE 


The extension of physical museum space and the metaphors of 


architecture to virtual space 


Narration via a collection of videos that engages the virtual 


5. Narration visitors providing them the opportunity to investigate a theme in 


_ a variety of ways and construct their own meaning 


Van Gogh's Van Gogie Masterporces fre 


The Yellow Horse 


The Bedroom, Octoder 1855 
od 0c cerves 


Fig. 1. Screenshot of the Van Gogh Virtual Exhibition Tour at the National Gallery of Art 


Interactivity with Scalable Images and Texts. Image scalability provides the oppor- 
tunity to examine museum artifacts or parts of them in detail by applying zoom tools 
over high resolution images. These zoom-on-demand features allow viewing aspects 
of photos that are not visible to the naked eye because of their small size or because of 
the museums' spatial proximity restrictions. Image exploration tools make VEs highly 
interactive and enhance museum experience [6]. The selected case for this study, 
labeled as M2, is the Metropolitan Museum of New York (http://www. 
metmuseum.org/) (Figure 2). 
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Fig. 2. Snapshot of an object contained in the section of Greek and Roman Art of the Metropol- 
itan Museum of Art 


Navigability: Searching Utility for Images and Texts. This type of online museum 
environments offers multiple search options and enhanced image manipulation. Sear- 
chable databases typically contain 2D representations in the form of photos and flat 
scans of objects along with their corresponding metadata, which are uploaded to the 
museum’s online database. The hallmark of these sites is a search engine, which al- 
lows searching by content, concept, or metadata, thanks to an entry point usually con- 
sisting of a text area in which visitors enter search criteria based on keywords. The 
case selected for this study, labeled as M3, is the Museum of Modern Art 
(http://www.moma.org/explore/collection/index). Through its database, visitors can 
navigate the various thematic areas of the museum, and search its collections by artist, 
work or keyword. It also has an advanced search that allows adding refinement crite- 
ria such as period or object management status (Figure 3). 


Fig. 3. The advanced search engine of the Museum of Modern Art online database 
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Virtual Spaciality: Simulation of a 3D Reconstructed Museum Space. Since place 
and space have been inseparable in our experience of the real world until now, when 
we experience the Web’s placeness, we assume that it must also have the usual 
attributes of spatiality [Weinberger 2002, p. 56 after 12]. In this type of online re- 
source allows ‘free’ and interactive real-time navigation in a 3D space that reproduces 
more or less realistically the museum galleries. This kind of online resources usually 
seeks to reproduce as realistically as possible the experience of the visit, with the 
added value of the multimedia information, the hypertext/spatial navigation, and the 
possibility to manipulate (zooming, rotation) objects. The case selected for this study, 
labeled as M4, is the Van Gogh Virtual Museum (http://www.vangoghmuseum.nl/), 
which constitutes of a typical example of a 3D reconstruction of a museum setting 
using computer-aided design tools and gaming technologies (figure 4). 


Fig. 4. Snapshot of the Van Gogh Virtual Museum 


Narrative Videos. The last category corresponds to Virtual Museum websites containing 
narrative embedded videos. The selected case for this study, labeled as MS, is the Virtual 
Silver Screen of the Library and Archives Canada (http://www.collectionscanada.ca/ 
silverscreen/). The website uses Flash technologies to present different Canadian films of 
the early 20th century, which are perceived as historic documents organized by themes 
that the user can select for visualization (Figure 5). 


” > 


a 


Fig. 5. Snapshot of the Home page of the Virtual Screen Silver 
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3 Usability Evaluation 


According to ISO-9241 describing the ‘Ergonomic requirements for office work with 
visual display terminals’ (ISO, 1998) standard, usability of a system is defined as the 
ability to function effectively and efficiently, while providing subjective satisfaction 
to its users. Usability of an interface is usually associated with five parameters (ISO, 
1998; Nielsen, 1993), derived directly from this definition: An interface (a) is easy to 
learn, (b) is efficient to use, (c) is easy to remember, (d) produces few errors, (e) is 
pleasant to use. 

The QUIS (Questionnaire for User Interaction Satisfaction) questionnaire (Schnei- 
derman and Plaisant, 2005) assessed museum participants’ contentment, while inte- 
racting with the virtual museums. This questionnaire was used as the main instrument 
recording their subjective assessments. The QUIS questionnaire consists of 7 parts. 
Part 1 concerning the general experience with ICT (Information and Communication 
Technologies) is often omitted. Part 2 assesses the overall user reactions as regards to 
the evaluated system, Part 3 concerns the windows layout of the system, Part 4 the 
terminology used, Part 5 the learnability of the interface (how easy it is to learn) and 
Part 6 the system capabilities. For this research needs we have selected the parts of 
the questionnaire that concern the overall reaction to the website and the subjective 
satisfaction of the users. 


4 Materials and Methods 


Due to technical restrictions not all Virtual Museum websites were able to run on 
tablets, smartphones and other portable media. Thus, the experiment have been con- 
ducted in a HP workstation with two 2.4GHz Xeon processors, 2048 MB memory and 
19’ inches screen to secure the same testing conditions for all. 


4.1 Participants 


A total of one-hundred sixty-four (164) volunteers (males and females, aged 19-37), 
mainly undergraduate and postgraduate students from the Aristotle University of 
Thessaloniki, Greece, participated in the experiment. Virtual visits for academic or 
professional research are considered the most demanding kind of visits in a virtual 
museum because they are targeted, have defined learning requirements and time con- 
straints. Random or unintended visits could contribute less to this study. Also, return- 
ing (physical and virtual) visitors would have been inconsistent because of their pre- 
vious experience and knowledge. 

All participants reported to have at least basic knowledge of computers and good 
knowledge of the English language. All students selected had never visited the virtual 
museum websites before. Participants in all conditions were naive as to the purpose of 
the experiment. 
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4.2 Experimental Procedure 


The premise of the proposed research is that different visualisation methods -with 
their specific associated capacities- serve different aims, connected with usability, 
presence, motivation and learning outcomes. The evaluation methodology proposed 
is based on questionnaires assessing such aspects of the virtual museum experience, 
administered after navigating the selected virtual museums presented in Section 3. 
The results will be used for a comparable evaluation of various approaches regarding 
the presentation and interaction methods used for museum artefacts. The evaluation 
and the interviews took place at the laboratory of Photogrammetry and Remote Sens- 
ing of the Aristotle University of Thessaloniki, Greece. The interviews have taken 
place in laboratory-like conditions, where no visitors were allowed, so as the users 
can be concentrated to the completion of the questionnaires. The evaluation involved 
only one participant at a time and assistants instructed the end-users if they needed 
help. Tracking of user errors while navigating as well as the time needed to complete 
the tasks were not recorded, because it was not our intention to test the users’ perfor- 
mance, but the websites’ performance. The evaluation used cued testing, which in- 
volves explaining to the users the purpose of the project and asking them to perform 
specific tasks or to answer questions. Four steps were undertaken: 


1. Goal setting: users start with a plan of the tasks to be accomplished. 

2. Exploration: users explore the interface and discover useful actions. 

3. Selection: users select the most appropriate actions for accomplishing their task. 
4. Assessment: users interpret the system’s responses and assess its progression). 


The participants were allowed to select the virtual exhibitions and exhibits they pre- 
ferred in order to feel they had the control over their own learning. The same proce- 
dure was repeated for each of the five museums with only one participant at a time. 
Each participant experienced all websites and the order of the websites was estab- 
lished randomly. The questionnaires were completed directly after the exploration of 
the Virtual Museums websites. 


4.3 Statistical Analysis 


The questions administered were subsequently subject to statistical analysis, which 
was divided in two parts. An initial prediction stipulated that the Virtual Museum M4 
would be the most suitable for learning. According to previously aforementioned 
researches on constructivistic learning/serious games, the reason would be that it si- 
mulates a real visit (emotional component) and allows self-controlled navigation in a 
reconstructed space as well as interaction with objects. The first part of the analysis 
sought to verify whether the Virtual Museum (M4) provided the most efficient and 
engaging experience. The answers to the virtual museum questions were tested for 
normality before performing the analyses using the Shapiro-Wilk test and the one- 
sample Kolmogorov—Smirnov test. We proceed with non parametric test Kruskal- 
Wallis for each question to reject the null hypothesis that all scores are similar for all 
museums. The hypothesis is rejected on sig. (P)<0.05. 
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We examined which museums provided statistically significant differences in rela- 
tion to all aspects investigated highlighted by the Kruskal-Wallis test using the non 
parametric test Mann-Whitney U Test. Finally, we adjusted p values for multiple 
comparisons by multiplying p value with number of comparisons (e.g. not adjusted 
p=0.011 *10 => adjusted p=0.11). A significance level of 0.05 was maintained (p 
must be <0.05 to be significant). p<0.1 may indicate a trend meaning that if we had 
more samples possibly it would have been significant. 


5 Results and Discussion 


Based on the Shapiro-Wilk non-parametric test for normality only few questions in 
few museums follow normal distribution. We proceed with the non parametric test 
Kruskal-Wallis for each question to reject the null hypothesis that all scores are simi- 
lar for all museums. The hypothesis is rejected on sig(P)<0.05. We examine which 
museums present difference for the questions highlighted by K-W Test using Mann- 
Whitney Test. We adjust p values for multiple comparisons by multiplying p value 
with number of comparisons (5 museums in pairs=10). (e.g. not adjusted p=0.011 *10 
=> adjusted p=0.11). Based on the above the statistically significant differences are: 

In the question that concerns the overall reaction of the user to the virtual museum 
webpage there was statistical difference between: 


e The interactive virtual museum with the scalable images and texts (M2) scored 
better (MDN=5,IQR(6,5-5)) that has scored better than the virtual museum with 
the panoramic images (M1) (MDN=5,IQR(6-5)) with adjusted p=0.00. Novel in- 
teractive technologies that permit the close inspection of virtual museum objects 
and provide the opportunity to the users have to observe the details of museum ob- 
jects on a screen are more attracting and engaging for the virtual visitors than the 
panoramic images that do not provide enough context information about the exhi- 
bits and partially distort the museum interior they present. 

e The virtual museum that simulated a 3D reconstructed museum space (M4) 
(MDN=6,IQR=(7-5)) that has scored —as expected- better than the virtual museum 
with the panoramic images (M1)(MDN=5,IQR(6-5)) with adjusted p=0.00. In the 
simulated 3D virtual museum environment the virtual visitor can freely navigate, 
select and obtain information about the exhibits, whereas in the panoramic images 
the functions of the virtual museum and the opportunities for navigation and explo- 
ration provided to the virtual visitors are limited. 

e The interactive virtual museum with the scalable images and texts (M2) 
(MDN=6,IQR(6,5-5)) that has scored —as expected- better than the virtual museum 
with the searching utility for images and texts (M3) (MDN=5,JQR(4-6)) with ad- 
justed p=0.00. 

e The virtual museum that simulated a 3D reconstructed museum space (M4) 
(MDN=6,IQR=(7-5)) that has scored —as expected- better than the virtual museum 
with the searching utility for images and texts (M3) (MDN=5,IQR(4-6)) with 
adjusted p=0.00. The result is influenced by the phenomenon of disconnectedness, 
in which the virtual visitors jump from page to page [13, after 14] rather than 
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following a progressive path in a simulated 3D virtual museum environment that 
provides a clearer perception of its exhibits. 


In the question that concerns the subjective satisfaction: 


e The interactive virtual museum with the scalable images and texts (M2) 
(MDN=6,IQR(7-5)) and the virtual museum that simulated a 3D reconstructed mu- 
seum space (M4) (MDN=6,IQR(7-5)) have scored better than the virtual museum 
with the searching utility for images and texts (M3) (MDN=5,JQR(6-4)) with ad- 
justed p=0.03. 


The virtual museum with the searching utility for images and texts has received low 
scores and this can be explained by the fact that it can be useful for people, such as 
students and experts that search for specific information, but it does not have the abili- 
ty to connect the virtual museum exhibits with their context and the virtual museum 
space, to make a deep impression and convey a virtual museum experience that can be 
accompanied by the sense of entertainment, joy, and learning. 


Acknowledgments. The authors would like to thank the participants that take part in 
the experiment. 
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Abstract. In this paper a location-based augmented reality application is pre- 
sented. It is a mobile application whose goal is to facilitate the journey of 
millions of pilgrims when performing Hajj and Umrah and overcome the diffi- 
culties they face. Using the Augmented Reality, the application displays differ- 
ent types of information about the pilgrims surroundings in a mobile camera 
view. The usability testing of the proposed application ended successfully with 
a very high rate of positive feedback from users. 


Keywords: Location-based Augmented Reality, GPS, compass, accelometer. 


1 Introduction 


Millions of pilgrims come every year to Mecca’ to perform Hajj’ during the days of 
Hajj or to perform Umrah’ at any time during the year. Statistics about the number of 
pilgrims who came to perform Hajj in the last five years are given in tablel [1]. Pil- 
grims need all the necessary information to accomplish their spiritual journey such as 
what rituals they must do, what places they must visit, where these places are located 
and how far are these places. They can get this information from several sources such 
as: their campaign, ask volunteers, use conventional maps or following the signs. But 
there are problems when using these sources; some of them are not available all the 
time like the lack of volunteer, also joining a campaign force the pilgrim to stay all 
the time with them which prevent him from moving freely. Some of sources do not 
give the pilgrim accurate and adequate information such as street signs, so they are 
not enough to guide him. Also, it is not practical to carry manuals all the time, also 
manuals are not trusted because any one can print and publish them. Some of pilgrims 
cannot read maps. In addition, many foreign pilgrims are visiting Mecca once in their 
lifetime and during their presence in the country they would like to learn more about 
these holy places. 


1 
2 
3 


The holy city in kingdom of Saudi Arabia and it contains the holy places of Islam. 

Means literally "to set out for a place". For a Muslim, that place is the Holy City of Mecca. 
Visit of the holy places and perform Tawaf around the Kaaba and Sa'l between Al-Safa and 
Al-Marwah, after assuming Ihram (a sacred state). 


R. Shumaker and S. Lackey (Eds.): VAMR 2014, Part II, LNCS 8526, pp. 134 2014. 
© Springer International Publishing Switzerland 2014 
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Table 1. Distribution of Pilgrims per year 


Year Total number of Pilgrims 
2009 2,313,278 
2010 2,789,399 
2011 2,927,717 
2012 3,161,573 
2013 1,980,249 


To address this issue, this paper presents a mobile application based on Augmented 
Reality (AR) which aims to facilitate Hajj and Umrah journey for pilgrims through the 
use of their mobile phones. The proposed application, called Manasek AR, displays all 
the needed information about the pilgrim's surroundings in a mobile camera view. 

The rest of the paper is organized as follows. In section 2, a brief overview of re- 
lated work is given, followed by the description of the proposed software application 
in section 3. In section 4, results of the usability testing are discussed. Finally, the 
section 5 concludes the paper. 


2 Related Work 


Augmented reality (AR) is considered as variation of Virtual reality (VR). In AR, the 
user can see the real world, with virtual objects superimposed upon or composited 
with the real world [2]. In other words, a typical AR environment has digital informa- 
tion transposed onto a real-world view. While in VR, the user is totally immersed in a 
virtual or synthetic world. Therefore, AR supplements reality, rather than completely 
replacing it. In [3], an AR system is defined as the user interaction with the real world 
through supplementing the real world with 3D virtual objects. 

Several papers have been written on augmented reality [2], [3],[4] and many appli- 
cation areas use this technology, such as in medical [5] [6] [7] [8], military [9], manu- 
facturing [10] [11], entertainment [12]. 

Augmented Reality is now emerging as an important technology for many com- 
mercial applications in different fields. In tourism, for example, many AR phone ap- 
plications have been developed such as Wikitude [13], in 2008, and Layar [14]. 


3 Methodology 


Manasek AR utilizes the local-based augmented reality to improve the Hajj and 
Umrah experience for pilgrims and overcome the difficulties they face. It provides a 
complete guidance for pilgrims by giving them all the needed information about Hajj 
and Umrah places in a completely different way that engage them with their imme- 
diate surroundings. 
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Manasek AR locates the place on the mobile screen tracked by the mobile camera 
of the pilgrim. The position of the place is calculated using the pilgrim's position with 
GPS, the direction of the device is calculated using the compass and accelerometer. 
When the object is located, the augmented reality takes place by giving description 
about the object on the mobile screen. Internet connection is not required for this ser- 
vice because when a place is located, their related information are retrieved from a 
database, these database contains all the information about the saved places. The ar- 
chitecture of application is given in figure 1. 


Step 2: The posi- 
tion of the place is 
calculated using the 
ef pilgrim's position 
with GPS, and the 
direction of the 
device is calculated 


eS using the compass 
and accelerometer. 


Step 3: When 
the place is 
located, the 
related data is 
retrieved. 


Step 1: Pilgrim 
holds the camera in 
front a place to get 
live view and see 
the description 
about it place. 


Step 4: Finally, the 
augmented reality 
takes place by 
overlay the re- 
trieved data de- 
scription on the 
mobile screen. 


Fig. 1. Manasek AR Architecture 


The application allows pilgrims to choose the type of information; either locations 
of Hajj and Umrah places (how far are they from those places), historical information 
about the places or guidance information of how to perform rituals related to a partic- 
ular place (Manasek information). Also allows pilgrims to add their campaign place, 
view maps, and get the recent news of Hajj circumstances through the official account 
of the Ministry of Hajj and Umrah on twitter. 


3.1. Application Features 


The features or functions provided by the proposed application are presented and 
explained in the following. 


AR Function and Information Types. This function allows pilgrims to choose the 
type of information; either locations of Hajj and Umrah places, historical information 
about these places or guidance information of how to perform rituals related to a par- 
ticular place (Manasek information). In figure 2, location information about King 
Abdulaziz gate are displayed. The location information are the distance between the 


Manasek AR: A Location-Based Augmented Reality Application 137 


pilgrim and the gate (130m) and also the position of the gate in the holy mosque. Ma- 
nasek information about Kaaba‘ are given in figure 3, the information are the rituals 
that should be performed in this place. And in figure 4, historical information about 
Kaaba are displayed. User can swap the top bar to select a specific information type. 


« MANASEK AR 


Fig. 3. Manasek Information 


* Cuboid building at the centre of the holy Mosque in Mecca. 
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BS MANASEK AR 


Manasek | ition Historical Information 


Hajr-al-Aswad (Black Stone):A special stone was placed in its eastern 
corner to mark the starting point for the circling of the Ka’bah. This stone 
according to the Prophet's (e) explanation was originally shining white in 
colour when it was brought down from Paradise. However, due to the sins 
of manit changed to its present colour of black, hence the name Haj- 
Aswad (the Black Stone). Narrated Ibn Abbas (may Allah be ara with 


him), Allah's Messenger (e) said, "Th 
Sachse uineenn, ; e black stone descended from 


it black.° k, but the sins of the descendants of Adam made s 


Manasek 9 
Campaign 


Fig. 4. Historical Information 


Help Function. To explain to the user how to use the application properly, developers 
come up with a new idea that explains the whole interfaces and _ the functions to the 
users in an easy and stunning way that fit and suit the nature of smart phones. The 
following steps show how to use this function: 


1. The user has to touch the help button (question mark button according to Android 
list of legal button) in the top right of each interface screen, as shown in figure 5. 

2. A semi-transparent panel will be displayed above the original interface, it includes 
arrows and brief descriptions of how to deal with the interface component. 

3. To remove the instruction panel, the user has to touch the help button again. 


BS MANASEK AR ‘@) 
R 1- Swap the type of information 5-Cloze help 
2-Teuch the reder te show zoom ber Gap 


e 
3-Touch the marker to show information 


4-Touch any where in the canves to remove description box 


Fig. 5. Help Function 
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Radar View Function. The radar is used to display the icons for objects outside the 
user’s field of view by some range. It also acts like compass as it illustrates the coor- 
dination to the user whenever he/she moves its device. The objects on the radar indi- 
cating the existence of places around the user will be moved in and out of the view as 
the user moves and rotates. The radar view is shown at the top left corner in the tab 
"Manasek" as illustrated in figure 6. 


B= MANASEK AR 


location Information 


‘a 9 


Manasek Campaign 


Fig. 6. Radar View Function 


Zoom Bar Function. As mentioned previously, the radar view uses some range to 
display objects around the user and to allow the user to gain control of this range as 
he/she want. Developers give the user the possibility to set the radius of data collec- 
tion from Om to 20,000m (20 km). To display the zoom bar to the user, he/she has to 
touch the radar view in the tab "Manasek", and the zoom bar will be immediately 
displayed in the right side of the screen. 


B= MANASEK AR 


location Information 


ies 9 


Manasek Campaign 


Fig. 7. Zoom Bar Function 


140 M. Taileb et al. 


Add Campaign. When the user clicks on the campaign tab the camera view will 
show up. The user should be in the same location of the campaign to add the camp- 
ing marker because the marker depends on the user location. This procedure is illu- 
strated in figure 8. 


MANASEK AR 


Abdul Latif 
Jameel Hospital 


. rad for Medical 
> H Rehabilitation 
UE BS pains 
rm \ 
ON E* 


3 
SS ao 
Google 

i= 


Info 


Manasek Campaign 


Fig. 8. Add Camping Function 


Delete Marker from the Map. When the user add a marker via Add Campaign function, 
the marker will appear on the map. To delete this marker for any reason, the user 
should follow these steps: 


e Long click on the marker. 

e Alter dialog will show up, as shown in figure 9. 
e Yes, to delete the marker. 

e No, to keep the marker and cancel the operation. 


Delete Marker 


APE HOU Sure you want te delete this 


Marker? 


Fig. 9. Delete Marker — View Map Function 
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News. When the user clicks on the " News " tab for the first time, a blank page is 
displayed. After that, the user pulls the page to get the recent news from twitter. Final- 
ly , the news are returned from twitter and presented as a list, as shown in figure 10. 
The user can pull the page when he/she wants to see the recent news from twitter . 


BS) MANASEK AR (@) 


™) Civil Defense 203d 


Lieutenant Al-Amr: our civil defense men made us proud of them for what they provided for the pilgrims this 
year 


Civil Defense 203d 


Lieutenant Al-Qarnu: Hajj institutions were committed to the safety regulation more carefully than the previous 
years 


™J Civil Defense 204d 
Proudly smile for serving the pilgrims http://t.co/KaCEKT80 


™J Civil Defense 204d 
Arafa day http://t.co/QkHpsM78 


9 


Manasek Campaign 
Fig. 10. News Function 


The proposed application is similar to the AR software Wikitude [13]. We present 
in the following the differences between Masasek AR and Wikitude: 


e Manasek AR’s database content is static and updated under the supervision of de- 
velopers. Unlike Wikitude, it is mostly user generated. 

e Manasek AR is customized only for Hajj and Umrah places and covers them with 
details. While Wikitude is a world browser. 

e Manasek AR provides different types of information for users. It allows pilgrims to 
choose the type of information; either locations of Hajj and Umrah places, histori- 
cal information of these places or guidance information of how to perform rituals 
related to a particular place. Besides, other services are available to help pilgrims 
in Hajj and Umrah. 

e Internet connection is not required for most of the services of Manasek AR. This is 
because of the static database that contains all the information about the saved 
places. 


4 Usability Testing Results 


This testing process involves several users outside developers team , and those who 
have no idea about the application, and give them a set of tasks that cover all interface 
functionalities. Tasks are given in the table 2. During usability testing, testers observe 
the users’ attitude. The results of this testing are excellent and most of users perform 
the tasks without confusion. 
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Table 2. Usability Testing Results 


Task Comment 

View help Done correctly 
Hide help Done correctly 
View map Done correctly 
Show the location information Done correctly 
Show the manasek information Done correctly 
Show historical information Done correctly 
Hide the information box Done correctly 
Show the zoom bar Done correctly 
Hide the zoom bar Done correctly 
Add campaign Done correctly 


Delete campaign There is some late . Developer solved this 


problem. 
View info Done correctly 
View Twitter timeline Done correctly 


The system testing was performed in the holy Mosque in Mecca. The system re- 
cognizes all the places contained in the database, gives the right direction, and dis- 
plays the correct information according to the selected type of information. 

In the usability testing, 35 users outside the developers team were involved. They 
performed tasks that cover all the interface components. Very good results are obtained 
and most of users perform the tasks without confusion and give positive feedbacks. 


5 Conclusion and Future Work 


Manasek AR is classified as a location-based augmented reality application which 
provides a complete guidance for pilgrims. It displays all the needed information 
about the pilgrim's surroundings in a mobile camera view. Also it allows pilgrims to 
add their campaign place, view maps, and get the recent news of Hajj circumstances 
through the official account of the Ministry of Hajj and Umrah in twitter. Manasek 
AR's goal is to grasp the opportunity of utilizing AR technologies to improve the Hajj 
and Umrah experience for pilgrims and overcome the difficulties they face. An evalu- 
ation was done to examine its ability to display places’ information regarding these 
places in Mecca correctly. The evaluation ended successfully with a very high rate of 
positive outcomes. As a future work, the system can be extended to cover different 
cities and places in Saudi Arabia. 
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Abstract. An augmented reality (AR) technology enables to show an additional 
information by superimposing virtual objects onto the real world. The AR tech- 
nology is gradually used in the learning environment for observing unseeable 
objects. Observation is the important process of inspecting a target object with 
significant details. It forms the basic of all scientific knowledge in education. 
However, there are only few AR applications which can visualize the temporal 
changes of the objects. In addition, the effect of this temporal change visualiza- 
tion by AR is not investigated from a scientific aspect. In this study, in order to 
clarify the effect of temporal change visualization by AR, we have compared the 
AR-based temporal change visualization method with the conventional temporal 
change visualization methods in the experiment. Especially, we set an observa- 
tion of the plant growth as a practical scenario. Through the experiment, we have 
confirmed that superimpose the past appearance onto the user’s viewpoint is ef- 
fective for temporal change observation scenario. 


Keywords: Augmented Reality, Temporal Change Visualization, Leaning Sup- 
port. 


1 Introduction 


Augmented Reality (AR) is a technology that integrates virtual elements into real en- 
vironment that user can interact in real time. By the definition in the literature [1], AR 
has three characteristics: combines real and virtual, interactive in real time, and regis- 
tered in 3-D. Due to the interaction, visualization and annotation features provided by 
AR, many fields, such as entertainment, training, commercial, and education have been 
successfully implemented and explored. Especially, in the past decades, many AR ap- 
plications for education have been developed and the usefulness of these applications 
were explored. However, the effectiveness of AR in the learning process is yet to be 
explored and evaluated based on learning theories. Furthermore, only a few existing 
AR prototype systems are created based on the theories to provide Augmented Reality 
Learning Experiences (ARLEs). 

Experiential learning theory proposed learning as a four-stage cycle and em- 
phasizes the importance of experiences in the learning circle. Contextual learning, a 
curriculum design philosophy, concurs with the importance of experiential learning. It 
points out that learning only takes place when students process new information with 
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ne Concrete experience 4] 


Testing concept in new situations Observations and reflections 


[Ee Formation of abstract concepts ez 


Fig. 1. The Experiential Learning Theory 


their personal experiences [3]. The cycle of experiential learning theory (Figure[]) usu- 
ally starts with having a concrete experience, followed by data collection from observa- 
tion. Then, collected data is analyzed to make abstractions and generalizations, which 
are then tested on new situations. This testing stage starts the cycle again which gives 
the student another set of concrete experiences. In the traditional classroom learning, 
abstractions and formula are taught by teachers and textbooks. The stages of concrete 
experience and observations and reflections are limited. These two stages are usually 
carried out from field trips and experiments which can only be done in limited time and 
they cannot be repeated or accessed easily. On the other hand, AR can provide learning 
experiences to the user anytime with more flexible. This research focuses on obser- 
vation not only because AR has the character as an display technology which is able 
to perform various visualization methods, but also because it is measurable. We can 
quantify the observation by evaluate how much information has been collected and if 
these information are better than information collected using non-AR method. We also 
noticed there are difficult scenarios in observation that observer needs assistance. We 
classify these difficult scenarios into three: limitation of senses, occlusion, and temporal 
changes. In these scenarios, the observation under temporal changes is less explored, 
and has less related works of supporting visual comparison. Overall, the goal of this re- 
search is to develop a prototype system that support observation under temporal changes 
in the classroom learning and evaluate the temporal change visualization methods. 


2 Observation Support by AR 


In the observation process, the observer collects data and information about an experi- 
ence. However, there are several difficult observation scenarios in classroom learning. 
For example, to observe solar system planets or cells, to observe human organs and to 
observe physics collision. In this section, we briefly review the literatures related obser- 
vation support by AR. These literatures can be classified into three groups by the aim 
of the applications: limitation of senses, occlusion, and temporal changes. 


2.1 Limitation of Senses 


Human has sense organs that are complicated structures which provides perception and 
sensation to the environment. However there are things which cannot be sensed due to 
the limitation of our senses. In many cases we can increase our sensory capabilities by 
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using physical measuring devices. AR technology, with the characteristic of combines 
real and virtual and as a display technology, has been implemented to support visual 
difficulties or enhance vision. In recent years, it has been used as a tool to reduce the 
limitation by visualize subjects that are invisible to the naked eye. 

The Real-time Visualization System [6] is an AR education tool combine traditional 
experiment and computer simulation. In traditional experiment, iron sand is used but 
it is time-consuming and impossible to deal with complicated models. Although these 
models can be handled by computer simulations, they are difficult for novice users, 
and provides little or no interactions. The Real-time Visualization System overcome 
these disadvantages which allows students to observe magnetic fields and move objects 
to change the field in real-time. Another example of such system is used to teach 
organic chemistry by visualizing electron activity and dipole moment. In the traditional 
teaching, printed materials (e.g. graphics on paper or in books) or molecular models are 
used. The printed materials have various variety images but are limited to 2D pictures 
while molecular models have less variety but are display as a 3D structure. By using 
the system, students can chose elements from a booklet and the system will generate 
three-dimensional (3D) molecular models. 

These systems support the observation under limitations of human senses and using 
AR to visualize the subjects and allow students to achieve observation in a approachable 
way. However, as we mentioned previously, these systems are not designed based on 
learning theories and did not conduct user experiments to evaluate the usefulness and 
effectiveness of their systems. 


2.2. Occlusion 


When the occlusion occurred, we need to physically obviate the blocking in order to 
make an observation. For example, in order to see the underground sewage system, 
we need to physically break the ground to see the actual pipes that are buried. AR 
technology as a display technology has been implemented to visualize the occlusion 
without the need to remove or obviate the blocking. 

The mirracle visuals the CT dataset for anatomy education. Using the system, 
the user can see the inside of the human anatomy without dissection courses that are 
often difficult to take place and requires a lot of effort. XRay-AR is a visualization 
method implemented in AR applications that shows a see-through affect [8]. Similar 
to the difficult scenario of limitation of senses in observation, the mirracle system used 
AR to visualize subjects that are hidden. By using the system, complicate experiment 
and troublesome process can be omit. However, they have the same drawbacks as well. 
The mirracle is not designed based on learning theories and no experiments were con- 
ducted; and XRay-AR have not been officially implemented to a particular educational 
AR prototype system.. Therefore, we cannot determine how learning is effected by the 
system or this technology. 


2.3 Temporal Changes 


Temporal changes mean that the changes of the subject happen over time. For example, 
the changes of height and wight of human being and the changes of colors of leaves 
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according to seasons. To observe temporal changes, the observer needs to pay attention 
to the subjects carefully to notice the differences between statuses. Observe temporal 
changes is difficult to achieve because of the time factor. We are not able to see multiple 
statuses at the same time and compare the differences between statuses to know the 
changes. One of the most common methods of observing temporal changes is by visual 
comparison. Forsell et al. said Visual comparison tasks take a central role in visual 
data exploration and analysis.” [2] . In this paper, the authors also describe three phases 
of comparison: 


1. Selection of pieces of information to be compared, 
2. Arrangement of the pieces to suit the comparison, and 
3. Carrying out the actual comparison. 


By using AR, the first two phases are achieve automatically by the system. The Vir- 
tual Vouchers is an example how AR assists visual comparison in non-classroom 
learning. In the field, when botanists need to identify a collected specimen or verify the 
existence of a new species, they initially consult their own personal knowledge and a 
paper field guide. In this case, the paper field guide might not contain full specimen 
collection or species samples, and it is difficult to use. However, the Virtual Vouchers 
system allows the user to access and view large amount of data and display the data 
side-by-side with physical specimens. 

The City ViewAR [] is an other example for non-classroom learning of observing 
temporal changes using visual comparison. This system shows the street view before 
the 2011 earthquake in Christchurch onto the real buildings which are remained. In this 
case, students can compare the before and after scenes. The Campus Butterfly Ecology 
Learning System presented a system that allows students to observer the virtual 
butterfly simulated and augmented in campus view. Different to the aforementioned 
systems which are field trip learning, the Campus Butterfly Ecology Learning System 
is used with the regular classes. However, this system only provides simulation but not 
visual comparison. 

In this research, we focus on supporting temporal changes with the following rea- 
sons. First, this category is less explored, and has less related works of supporting vi- 
sual comparison for temporal changes in classroom learning. Second, to support the 
observation of temporal changes, not only subjects and states are important but also 
how to ’control” time needed to be taking care of. We propose that observation of tem- 
poral changes can benefit from AR technology and an pioneer evaluation is necessary 
to determine the effective in learning. 


3 Visualization of Temporal Changes by AR 


In this study, in order to visualize temporal changes of the object, we propose the view- 
morphing based superimposition that displays the pass appearance of the object. Gener- 
ally, in AR, superimposed objects are represented by 3D models. However, it is difficult 
to make a 3D model of the target object by novice users. Especially, in the learning en- 
vironment, where the typical users may be children. In order to avoid making 3D model 
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Fig. 2. The flow diagram of the proposed system 


of the target object, we employ the view-morphing method [9] to generate synthesized 
image. The view-morphing method can virtually generate arbitrary view image from a 
image pair. In order to realize the view-morphing based superimposition, the proposed 
method is composed of an offline image database construction process and an online 
novel view image generated and superimposition process as shown in Figure[2] In our 
method, we assume the environment, which fiducial markers are arranged around the 
target object and the relative position between the target object and fiducial markers is 
fixed. In this section, we describe the details of these processes. 


3.1 Construction of Image Database 


In the offline phase, the user is requested to take multiple photos of the target object 
from different camera positions and angles with temporal data of time t. The camera 
pose C; of the captured image / is calculated using fiducial markers. In addition, feature 
points are extracted from the captured image, and then corresponding pairs of feature 
points between the captured image and images in the database are searched. Finally, 
for each photo, the camera pose Cj, image data, and corresponding information of nat- 
ural features are registered to the database. In the registered image data, background 
information is removed by the simple background subtraction method using known 
background color information. 
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3.2. Visualization of Temporal Change by Novel View Generation 


In the online phase, firstly, the user manually select the target time for comparison, and 
then AR images are generated iteratively. In the AR view generation process, an image 
pair for novel view image generation is selected from the database with the following 
process. 


1. Registered images are filtered to become candidate images based on the angels of 
optical axes and saved camera view direction. 

2. Filter the candidate images using the distances between current camera position 
and registered positions. 

3. Select two nearest camera positions that are located on the the left and right hand 
side of the current camera position. 


After finishing the image selection process, the view-morphing process is executed. 
The original view-morphing method [9] is composed of pre-warping the image pairs, 
morphing the pre-warped images, and post-warping the morphed image. This method 
assumes to generate a novel view image without intrinsic and extrinsic camera parame- 
ters. From this assumption, the original method needs two image warping processes. On 
the other hand, in our implementation, camera poses and intrinsic camera parameters of 
the image pairs are known, and the camera pose and intrinsic camera parameters of the 
input image from live video feed are also known. By using these known information, 
we can simplify the original view-morphing method. In our method, the post-warping 
process is removed by generating the morphing image at the camera position of the 
input image as shown in Figure[3] The concrete view morphing process is follows. 


1. Get the plane z through three points: Co, Ci, and C;. 

2. Derive the line PC, which is the intersection between two planes: plane and plane 
x=0. 

3. Get points mo; and nj; (i = 0,1) that are corresponding points of the end points of 
the epipolar lines (projected by on line PC,) on Jp and J. 

4. Calculate the intersection range of no; and nj; and the average point m of this range. 

5. Get point Cc. which is on the line-plane intersection of line mCo and plane z = 0, 
and point C on the intersection of line of mC; and plane z = 0. 


6. Project images Jy and J; from Cp and C, to Cc, and Cc. 


Co, C;, and C; represents the camera positions of database image 0, database image 1, 
and input image, respectively. Finally, the generated novel view image is superimposed 
onto the input image as shown in Figure[4] 


4 Experiment 


We compared the effectiveness of our proposed visualization method for observation 
under temporal changes with other visualization methods through the use study. In this 
experiment, we set an observation of the plant growth as a practical scenario. 
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Fig. 3. View-morphing in the proposed method 


Left image Novel view image 


Right image AR image 


Fig. 4. Example of input and generated images 
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4.1 Experimental Conditions 


Different types of visualization methods are suitable for different difficult scenarios of 
observation. Forsell et al. studied three approaches, side by side, shine-through, and 
fold, inspired by natural behaviors of printed paper [2]. In this experiment, different 
with paper-based comparison, we are not about to achieve the method of folding if 
we want to compare a physical 3-Dimension object. However, side by side and shine- 
through are possible to accomplish by traditional methods as well as Augmented Reality 
technology. In this experiment, we carried out seven visualization methods under the 
category of side by side and shine-through (overlay) for observation and comparison 
as shown in Figure |5} Characteristics of each visualization method are described as 
followings. 


1. Side by Side Based Visualization 
Printed images (Method A): Compare the images that were printed on papers with 
the subject by putting the papers beside the subject. Participants need to flip to the 
images that they wants to use for comparison. 
Displayed images (Method B): Participants achieve comparison by locating the 
subject beside the computer screen where the images are displayed. Images dis- 
played on computer screen are controlled using up and down arrows on the key- 
board. 
Displayed limited images on camera image (Method C): The system shows one 
of the registered images beside the orientation of the subject . In this condition, the 
system is without the view morphing function.. Participants may turn the subject 
around to observe from different angles. 
Displayed novel view images on camera image (Method D): Participants are 
using the system with view morphing function for comparison. The system will 
generate in-between images based on the saved information and the current camera 
position. 

2. Overlay Based Visualization 
Printed transparent pictures (Method E): Compare images printed on trans- 
parency with the subject by putting the transparencies in the front of the subject. 
Comparison are carried out the subject and one image which is rendered besides it 
using AR system 
AR with limited pictures (Method F): Compare the subject and the image ren- 
dered overlay on it using AR system. 
AR with free viewpoint (Method G): Compare the subject and the novel view 
image generated by view-morphing overlay on it using AR system (the proposed 
system). 


In the experiment, seven targets are provided to participants alternatively. 48 images 
were taken in 360 degrees around the target object with 7.5 degrees intervals were 
taken several days prior to the experiment and saved in database to represent the past 
status. The participants are required to use those images to conduct observation and 
comparison. 
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Fig. 5. Seven different visualization methods 


4.2 Design of the User Experiment 


In the experiment, the participant is required to observe the subject (plant), find the 
changes, and rank all seven methods based on the ease of observation and comparison 
after all the trials.. In each trial, the participant will received a set of images performed 
in different visualization methods and are requested to use these images to conduct 
observation and comparison. 

The measurements include the quantity of information and the accuracy of informa- 
tion. The quantity is measured by how many changes can the participant notice. How 
many correct and incorrect identified changes are used for measuring the accuracy of 
the information. We also require the participant to answer about the advantages and 
disadvantages of each visualization method. The changes of these plants include new 
buds, leaves fallen, and changes of angles for outer leaves. 

The procedure of the experiment is as follow. First of all, a brief interview with par- 
ticipants to gather basic information. This includes gender, age, any prior experience 
with AR applications. Secondly, explanations of the experiment, including the purpose 
of the experiment, the tasks for the participant, how the systems work, are provided. 
When the participant is ready, s/he can start to observe and compare using the target 
plant and provided visualization method. During the observation and comparison, the 
participants are required to mark the changes s/he found marking sheets. At the end 
of each visualization session, a short questioner which includes five self-report ques- 
tions and section for comments of advantages and disadvantages about the visualiza- 
tion method. After all seven trials, the participants are required to rank all the visualiza- 
tion methods based on ease of observation and comparison. Lastly, the participants are 
asked to write the comments regarding to the visualization methods and the experiment. 
Overall, the experiment took about one and half hours including the post-experiment 
questionnaires. Five self-reported questions in the questionnaire of each visualization 
method trial session are listed below. Q/~Q5 represent first to fifth questions. 
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Q1. I think it is easy to notice the changes with this visualization method. 
Q2. I think I found all the changes. 

Q3. I think it is easy to see the changes of color. 

Q4. I think it is easy to see the changes of height. 

Q5. I think it is easy to see the changes of angle. 


Answers of each question were given on a Liker Scale from 1 (Strongly disagree) to 5 
(Strongly agree). The ranking scores of seven visualization methods are given from 7 
(the best) to 1 (the worst). 


4.3 Result of the User Experiment 


The experiment involved 11 participants, 3 female and 8 male, with average age 29. Six 
of the participants do not have Augmented Reality (AR) development experience but 
participated AR-related experiments before. Meanwhile, the other five participants have 
AR development experiences. Each visualization method session took up to 6 minutes 
long and the whole experiment has a average duration of 90 minutes including the post- 
experiment questionnaire. 

Table[I]shows the mean scores of questionnaires for each visualization method. For 
question 1, visualization method F has the highest mean score (3.73) and Method A has 
the lowest mean score of 1.91. Method C and method G share the highest mean score 
(3.00) in question 2. Methods C, F, G have the highest score for the ease of notice the 
changes of color, height, angle, respectively. In the mean score of ranking, visualization 
Method C (Side by side with non-view morphing) has the highest score (5.27), followed 
by visualization Method D (Side by side with view morphing AR application) with 
mean score 4.73. Visualization Method A (Side by side with printed images on papers) 
and B (Side by side with displayed images on computer screen) share the lowest mean 
score (2.91). 


4.4 Discussion 


Table | shows that detecting color is easier using visualization Methods C and D while 
is more difficult using Method A and E. The methods that were scored higher for notic- 
ing the changes of height are Method F and C and Method B and E are scored lower. 


Table 1. The mean scores of questionnaires and the mean scores of ranking for each visualization 
method. Bold font indicates the highest scores in each question. 


[TOT TOF OF [OS [Score oF Ranking 


[Method G3.27)3.0022.73[3.36]4.00] 445 
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The methods that are easier for noticing the changes of angle were Methods C, F and 
G where the more difficult ones were Method A, B and E. According to the experi- 
ment user ranking results, Method C, D and G were scored higher and Method A and 
B scored the lowest. These scores were reflected in the scores of ease of observation 
(Q1) for each methods. Method C, F G also scored the highest where Method A and B 
scored the lowest. Combining these two results we have confirmed that users chose AR 
methods (C, D, F and G) over non-AR methods (A, B, E) for the ease of observation. 
Table 2]shows the mean of detection accuracy of each method. Considering the ac- 
curacy of changes detection of each method, Method A has the highest accuracy of 
changes detection while Method E has the lowest accuracy rate. Even though Method 
A has the best accuracy, participants did not think it was easy to use for observation and 
comparison. The reason for this outcome might be that we are familiar to manipulate 
and compare paper materials in our daily life. However when compared to other visu- 
alization methods, it is considerably more time consuming, difficult to manipulate and 
requires the user to do everything manually. In addition, we can see that Method F has 
higher detection error than Method G. We conjecture that the result might be caused 
by the occlusion. Four of participants reported that while using method F for obser- 
vation and comparison, the occlusion occurs and interrupted their comparison process. 
We think the occlusion effect is suppressed by the view morphing in Method G. 
Throughout the experiment and results, we have noticed that systems with and with- 
out view morphing yielded very similar results. We believe this is because the partic- 
ipants were able to access as many as 48 images (every 7.5 degrees around the target 
object). These images did not differ much to the images created from view morphing 
since the change of angle was very small. However, the result of detection error of the 
system with view morphing shows the possibility of improvement of the observation. 


5 Conclusion 


In this research, we are able to identify the most effective visualization method for ob- 
servation under temporal changes. The result of our experiment shown that all camera 
image based visualization methods which includes the AR-based visualization method 
have higher score than methods without camera images. In the future, the quality of the 
synthesized images and resolution of camera needed to be improved. The differences 
between the systems which has view morphing and without view morphing yielded 
similar results. This might caused by the amount of images that were provided to the 
participants which is more than usual cases. As the result the selected images may not 


Table 2. The mean accuracy of changes detection of each method 


[___ [Method A [Method B] Method C] Method D Method F [Method F[Method G] 


Accuracy of 0.85 0.84 0.77 0.84 0.67 0.77 0.76 
changes detection 


Number of 13 12 12 12 19 12 
detection error 
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differ very much to the view morphing images. We need to conduct additional experi- 
ment with reduced number of registered images for non-view morphing system which 
is more similar to actual comparison and further verify our assumption. 
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Abstract. In this paper, we describe the program of our AR workshops dedicat- 
ed to art students. Our observations regarding supervising such lab courses and 
students’ works are presented. We would like to present a methodology for AR 
training when the students are not experienced in computer programming. We 
hope this will encourage other art and IT teachers to join efforts and incorporate 
AR into curriculum as a very promising concept of merging technology with 
visual communication. The potential of AR is very high and, therefore, it is im- 
portant to introduce students to AR and the process of creating their own work- 
ing projects. 


Keywords: Augmented Reality, education, AR workshops, art projects, mobile 
AR. 


1 Introduction 


Augmented Reality (AR) applications and services are becoming very popular nowa- 
days, mainly due to expansion of powerful mobile devices and new concepts such as 
Project Glass by Google’. AR projects are no longer just laboratory concepts but ex- 
isting solutions supporting a constantly growing number of complex tasks, navigation 
systems, education, entertainment etc. Therefore AR is very close to becoming a 
household term and is visible in audio-visual media like games, TV, e-learning etc. 
According to the 2013 Horizon Report’, the use of wearable technology will increase 
which will accelerate the expansion of such technologies as augmented reality in the 
consumer market and educational sector. 

In 2013, we began to hold AR workshops for international groups of art students 
(Poland — Academy of Fine Arts in Katowice, Belgium — Antwerp Royal Academy of 
Art’, Finland — Aalto University). Our experience gained during these classes showed 
that teaching both IT tools and visual communication design was very beneficial to 
students. Merging knowledge and passion of lecturers from two different faculties, IT 
(Marcin Wichrowski, Alicja Wieczorkowska) and New Media Art (Ewa Satalecka), 
shows that such collaboration can yield very interesting projects. 


http: //www.google.com/glass/start/ 
http://www.nmc.org/publications/2013-horizon-report-higher-ed 
http: //grafischevormgevers.be/projecten?locale=en_US&wppa- 
album=1&wppa-photo=1&wppa-cover=0&wppa-occur=1 
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2 Related Works 


The methodology of teaching AR at universities [1], including previous authors’ pa- 
pers [2], and the attempt to improve the techniques applied so far serves as the moti- 
vation for this work. The importance of AR in education is presented in many sources 
[5], [6], [7]. We propose a methodology based on the findings of [1] and [2], and on 
our experience from the workshops we conducted for art students. 


3 What Is AR? 


When we look at the taxonomy of Mixed Reality (Fig. 1) we can observe that AR is a 
form of Mixed Reality, quite close to Real Environment. In contrast to VR which 
completely immerses a user in a computer generated world, AR enriches the real 
world by computer generated content. It could be 2D and 3D objects, audio or video 
files, textual information, avatar, interactive interfaces etc. The user can interact with 
these digital virtual objects superimposed upon or seamlessly mixed with the real 
world. AR supplements reality rather than completely replacing world around the 
user. It allows real and virtual elements to coexist at the same time and space. 


Mixed Reality (MR) = | 
<—ee———o 


Real Augmented Augmented Virtual 
Environment Reality (AR) Virtuality (AV) Environment 


Fig. 1. Mixed Reality [3] 


Definition given by Azuma [4] and Kaufmann [5] specifies the implementation of AR 
by three elements: 


e mix of real-world and computer-generated virtual elements, 
e interaction is provided in real-time, 
e elements are registered in 3D. 


We can experience AR using a desktop computer, a mobile device or a special Head- 
Mounted Display (HMD) (Fig. 2). Because of high availability and rapid development 
of handheld devices capable of delivering AR content, we decided to focus our work- 
shops only on mobile solutions. 
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B) C) | Head-Mounted 
Display 


augmented 
image 


smartphone /tablet 


visual marker’ 


visual marker 


Fig. 2. AR experience on a) a computer, b) a mobile device, c) a Head-Mounted Display 


4 Combining Artist’s and IT Perspective in Supervising AR 
Projects 


The idea of interdisciplinary workshops combining artistic and IT approaches 
emerged during joint supervision of final works of students of New Media Art De- 
partment at Polish-Japanese Institute of Information Technology. During the cycles of 
workshops which we held for international groups of students in Poland and later in 
Belgium and Finland, we tried to observe how the participants use this technology in 
graphic design. First we gave them a topic — usually quite an open one like “a love 
poem” or “a message from Finland,” then we introduced the processing theory and 
presented the tools. We began teaching with a group discussion; next we divided re- 
sponsibilities — an artist lecturer for graphic design quality and an IT specialist lec- 
turer for the quality of engine and ease of perfect delivery. Working as a team of tu- 
tors we had the opportunity to discuss parallel design and IT aspects of each project 
individually. It seemed to be a comfortable situation both for us and for students. The 
final results were satisfying and made us realize how these workshops triggered in- 
ventive and creative works of the participants of our classes. All of the projects pre- 
sented were done over four—five days of workshops with participants of various levels 
of technical advancement. Mixed-level groups were even more progressive, well- 
working and self-supporting classes. We realised that this methodology could be rec- 
ommended for workshops as a very effective method of skills development and 
stimulation for the youngest participants. Age differences, various level of abilities 
and different topics of projects encourage participants to exchange their knowledge 
and experience. 

From the IT point of view, the main challenge behind these workshops was finding 
a balance between the complexity of tasks given and the level of students’ experience 
in understanding multimedia creation dedicated to AR projects. From our observa- 
tions most of art students are good at preparing raster/vector graphic and animation 
forms mainly using Adobe products like Photoshop, Illustrator and After Effects. 
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Some of them specialize also in 3D computer graphic and 3D modeling. Program- 
ming skills are rare. However, our experience shows that aesthetic and well-designed 
AR project even with basic interaction could be also engaging for most users. 


5 Teaching Methodology in Detail 


The methodology we propose for AR workshops for art students is based on 5 days 
training, 7 hours a day, for groups of up to 16 students. The main part is a typical IT 
workshop, using computers, but students may bring drawings, paper mock-ups etc. 
Technical requirements for workshops concern installing desktop and mobile applica- 
tions, providing access to smartphones/tablets (Android/iOS) with Internet connec- 
tion, webcams, digital photo cameras and a color printer. After completing the work- 
shop, students should acquire the following competences: 


e Knowledge of using AR technology, including its pros and cons. 

e Understanding technical requirements and available solutions for building AR 
projects. 

e Preparing one’s own AR project for mobile devices using automated AR editors. 

e Documenting final work in the form of a poster with the description of results. 


Having in mind such short time for presenting the main concepts of AR, teaching new 
applications, preparing final working project and documenting it, it is important to 
carefully select proper tools and solutions which are adequate to the skill level of 
usually diverse group of students in terms of technical advancement. Moreover, work- 
ing with art students differs significantly from teaching IT students. They require 
much more attention and individual approach, because most of them have little 
knowledge about concepts behind AR workflow and programming [2]. 


5.1. ‘Introductory Lecture 


The 1* day of the workshop begins with a 2-hour lecture concerning artistic and IT 
aspects of AR usage. 

The artistic part focuses on the observation that we are living in the times of easy 
access to large amounts of information and sources of knowledge. In the rapid stream 
of data coming from everywhere we need to survive and safeguard our brains against 
overload. By designing visual communication we may grade levels of information 
complexity, and help users make decisions on how much they wish to get in one por- 
tion. Visual communications could be used as a form of package for complex infor- 
mation ordered in smaller, graded portions visible with AR. 

The IT part of the lecture presents how AR technically works and shows world- 
wide examples and demos of successful AR projects. Presented works were chosen 
carefully to show the broad spectrum of possible applications in various fields of life: 
advertising, marketing, shopping, entertainment, education, supporting complex tasks, 
navigation/sightseeing, architecture, military, medical etc. Special attention is paid to 
usage of AR in the art field by presenting projects of installations, objects, books etc. 


160 M. Wichrowski, E. Satalecka, and A. Wieczorkowska 


enriched by this technology. Many of them are works prepared by students during 
previous workshops, or AR regular classes held by Marcin Wichrowski at Polish- 
Japanese Institute of Information Technology [2]. Efforts are made to present the best 
working examples which could be tested by students even during the lecture. This 
allows bringing a lot of interest and inspiration, especially among persons who did not 
have a chance to test this technology personally before or are not convinced about the 
quality and reliability of modern tracking technologies. 

The lecture is followed by a one-hour brainstorming session to stimulate imagina- 
tion of students and discuss pros and cons of this technology. Students decide what 
task they wish to undertake and in a small seminar they report the aims, the users and 
the methodology of their projects. They discuss how much AR could help them in 
problem-solving during the design process. Decisions result in rough sketches and 
initial projects to be presented on the class forum, and are again discussed and ques- 
tioned by all of the participants. It helps improve the weak points and sometimes 
gives a new perspective and inspires changes. Observations made by students during 
these discussions are very important also from the IT point of view, and allow better 
understanding what is technically possible to achieve during 5 working days. Each 
student is supervised individually and has an opportunity to ask questions regarding 
artistic and IT scopes of planned AR project. 


5.2. Basic AR Tools and First Introductory Task 


The rest of the 1* day is dedicated to preparing working environment for developing 
AR projects based on carefully prepared tutorials and with the help of an IT lecturer. 
Because of the usually low level of programming experience we decided to use 
Metaio applications, which serve as simple introduction to build and experience AR 
scenes in a very short time. Metaio offers an integrated environment which consists 
of: 


e Creator (Windows/MacOS)* — an automated AR authoring tool dedicated to creat- 
ing AR scenes for desktops and mobile devices, 

e Junaio AR Browser (Android/iOS)° —a free mobile AR browser for loading AR 
channels created with Metaio tools, 

¢ Metaio Cloud?® - an online host service for storing users’ projects. 


Thanks to easy configuration and simple workflow, this solution allows creating first 
working scenes in a really short time. It gives the possibility to use three tracking 
technologies: image, object and environment based. Creator allows embedding differ- 
ent types of objects such as images, text, videos/animations with alpha channel, ani- 
mated 3D models, sounds, calendar events, links to websites, buttons for integrating 
with social networks and 360° panoramas. Students are also taught how to programme 
interaction in AR scenes. 


http: //www.metaio.com/creator/ 
http://www. junaio.com/ 
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Students using image based tutorials develop and test an introductory task “3D 
photo” in the form of an AR scene under precise supervision of IT lecturer. Standard 
photograph is divided into several plans that are stored in separate PNG files using 
Adobe Photoshop. Then the AR scene is created in Metaio Creator by placing these 
files on the z-axis at varying distances from each other. Background of the photo is 
recognized as a visual marker and triggers remaining elements on it. This allows cre- 
ating the impression of 3D look while observing from a different angle using a mobile 
device (Fig. 3). The presented task guarantees good understanding of proposed AR 
software and encourage students to test different scenarios. 


Fig. 3. Introductory AR scene - “3D photo” 


5.3. Experimenting with More Advanced Tools and Basic Programming 


During the 2™ day students experiment with various more advanced AR solutions like 
object and environment tracking, creating 360° panorama or building and placing 3D 
objects in the surrounding space. We have worked with: 


Metaio Toolbox for object and environment tracking’, 

Microsoft Photosynth for creating 360° panorama photos*, 

Autodesk 123D Catch for creating 3D models from photos”, 

typical applications for image, video and 3D editing, like Adobe Photoshop, Adobe 
After Effects, and Blender. 


These applications were carefully picked from available solutions to guarantee the 
best compatibility and provide seamless and easy to follow workflow. For program- 
ming basic interaction an Augmented Reality Experience Language (AREL) was 
used. Metaio Creator offers simple built-in code editor for AREL. In most cases inter- 
action involves adding touch events and interfaces to control objects on a screen. 


https://dev.metaio.com/sdk/toolbox/ 
http://photosynth.net/ 
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5.4 Prototyping and Developing Final Project 


The 3™ and 4" days are mainly dedicated to individual technical prototyping super- 
vised by the IT lecturer. The work done during days 3 and 4 is essential for acquiring 
the skills and finishing the projects on time. Students continue working with concepts 
proposed during the brainstorming session and after having their idea accepted by 
lecturers they start to prepare working prototypes. It is the most creative and impor- 
tant part of the workshop, which requires a lot of support from both the artistic and 
the technical supervisor. Problems that arise in projects are solved individually with 
the lecturers. It often happens that the proposed idea turns out to be impossible to be 
finished at a specified time or because of restrictions associated with the selected 
authoring tool. This is a great lesson showing the real issues that may happen during 
the implementation of AR projects and ways of dealing with them. Experimenting 
with prototypes also encourages students to look for the optimal solutions that can 
work in various conditions. The next step after solving the most significant problems 
in prototypes is the development of the final projects. 


5.5 Final Presentation 


The last day is devoted to polishing projects and documenting them in the form of 
posters, presented later in a dedicated university public space (Fig. 4). Prepared pro- 
jects may be used to load AR scenes by viewers using free Junaio AR Browser. The 
final exhibitions got a positive reception from the visitors, and many participants were 
interested in taking part in the AR experience. This contributes to the popularization 
of the AR technology among both students and other teachers, who are often inter- 
ested in the technical details and the possibility of using AR in their projects. 


Fig. 4. The final exhibition (The Royal Academy of Fine Arts, Antwerp) 
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6 Students’ Works 


Students quickly recognize the potential of AR and they use it in a smart way, usually 
for adding hidden messages and actions enriching visual communication. Thanks to 
the technology they were able to use a regular coffee cup (Fig. 5), a spread of a book, 
a post card, a toy (Fig. 6), a cinema ticket or even a facade of a building as markers — 
which play the role of keys to get the hidden information or encourage users to 
interact with image/object using mobile devices. This can evolve into different sce- 
narios - one can read a moving poem even while waiting for a friend in a café, as the 
poem is visible on a lid of a coffee cup. When reading a picture book for kids, one is 
able to bring the main characters to life, and have them act their roles on the screens 
of tablets or phones for educational purposes. A simple stamp on a postcard could be 
turned into a 3D object presenting architecture, a short video commercial, or any other 
information we wish to send from our holidays. 


Fig. 5. Coffee cup with video projection 


Fig. 6. Toy with animated 3D objects 
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Also a building’s facade (Fig. 7) may be read as a marker and deliver a short or 
long story about itself — the history of the building or some info-graphic which gives 
us additional visual or verbal information. 


Fig. 7. Building’s facade with additional typography 


One of the projects presents how the same landscape changes with the seasons. 
Another use of AR is to present a possible interactive game based on a real-life 3D 
object, whose interior is filled with interactive virtual characters. Traditional drawings 
set on a cube surface can deliver a nice animation, kind of a moving comic (Fig. 8). 
Even the real space of a corridor may be mapped to an AR project and changed into a 
battlefield against an interactive, virtual monster. An ordinary corner or a selected 
part of the wall is able to produce a sound composed by a student. A traditional art 
print illustrating a national saga changes into a funny animation, not hurting the book 
at all. 

Users do not need any heavy, big objects to read or watch in order to participate in 
multimedia artistic experience. The mobile phone is enough to get a really complex 
message. And this additional information is realized only on request. We are not at- 
tacked by its obsessive presence. 


7 Summary and Conclusions 


Our experience shows that students without any prior knowledge of AR can prepare 
working projects in less than a week. The most important issue is to select proper AR 
solutions which are easy enough to encourage students to think about functionality they 
want to achieve. AR supports delivering complex information ordered and packaged in 
visual “containers”, which could give completely new opportunities for art projects. It 
helps to organize and to deliver information visually and verbally, and students use it in 
dynamic and static context, as part of their visual communication projects. They apply 
AR to books or objects, as “real artefacts”: additional messages like instructions, info- 
graphics, oral or visual information, motion images, translations, etc. They put effort to 
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build visual aspects of their works and it sometimes takes even the form of installations 
and specially crafted objects which are triggers for AR experience. Students are very 
inventive in using these forms — for example they create board games based on real or 
mocked objects mixed with AR characters, add voice messages to objects designed for 
the visually impaired, use it in children education design, and for entertainment, as part 
of music and video performances, and many more. 


Message trom Finland / HELSINKI — Aalto / 18-22/11/2013 
Ewa Satalecka / Marcin Wichrowski AR Workshops 
Graphic Design Department Aafto University 


Zaneta Strawiak 
FRIENDLY FINLAND 


Fig. 8. Poster documenting final work with QR codes for testing AR scene in Junaio 


These prototypes herald the development of AR in visual communication design, 
and show that the possibilities are endless and depend only on creativity of designers 
and the software used. The popularity of new mobile devices and truly user-friendly 
AR creation tools continues to grow. For these reasons it pays to join efforts of art and 
IT teachers, and try to incorporate AR into curriculum as a very promising concept of 
merging technology with visual communication. 
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Abstract. Serious games are emerging as innovative tools to promote opportu- 
nities for human psychological growth and well-being. The aim of the present 
paper is to introduce them as Positive Technologies. Positive Technology is an 
emergent field based on both theoretical and applied research, whose goal is to 
investigate how Information and Communication Technologies (ICTs) can be 
used to empower the quality of personal experience at three levels: hedonic 
well-being, eudaimonic well-being and social well-being. As Positive Tech- 
nologies, serious games can influence both individual and interpersonal experi- 
ences by nurturing positive emotions, promoting engagement, as well as 
enhancing social integration and connectedness. An in-depth analysis of each of 
these aspects will be presented in the chapter, with the support of concrete 
examples. 


Keywords: Positive psychology, positive technology, serious games, well- 
being. 


1 Introduction 


Serious applications for computer game technologies have become important re- 
sources for the actual knowledge society. Their use and effectiveness have been 
broadly acknowledged in different sectors, such as education, health, and business [1]. 
By fostering continuous learning experiences blended with entertaining affordances, 
serious games have the potential to shape new opportunities for human psychological 
development and growth. They have in fact supported the creation of socio-technical 
environments [2], where the interconnection between humans and technology encou- 
rages the emergence of innovative ways of thinking, creative practices, and network- 
ing opportunities. Further, serious games have been capable of supporting wellness 
and promoting happiness. That is why they can be considered as “positive technolo- 
gies”. Based on the Positive Psychology [3] theoretical framework, the Positive Tech- 
nology approach claims that technology can increase emotional, psychological and 
social well-being [4]. 

Seligman and Csikszentmihalyi identified Positive Psychology as the scientific 
study of "positive personal experience, positive individual traits, and positive institu- 
tions" [5,6]. By focusing on human strengths, healthy processes, and fulfillment, 
Positive Psychology aims to improve the quality of life, as well as to increase well- 
ness, and resilience in individuals, organizations, and societies. 


R. Shumaker and S. Lackey (Eds.): VAMR 2014, Part II, LNCS 8526, pp. 169 2014. 
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The link with accurate and scientific methodological practices [7] has become the 
engine of interventions to study and promote the optimal expression of thought, emo- 
tions and behaviors. In particular, Keyes and Lopez [8] argued that positive function- 
ing is a combination of three types of well-being: (i) high emotional well-being 
(hedonic level), (ii) high psychological well-being (eudaimonic level), and (iii) 
high social well-being (social level). This means that Positive Psychology identifies 
three characteristics of our personal experience — affective quality, engagement/ 
actualization, and connectedness — that serve to promote personal well-being. 

Similarly, the Positive Technology approach claims that technology can influence 
both individual and interpersonal experiences by fostering positive emotions, promot- 
ing engagement, and enhancing social integration and connectedness. Positive 
Technology is an emergent field based on both theoretical and applied research, 
whose goal is to investigate how Information and Communication Technologies 
(ICTs) can be used to empower the quality of personal experience. 

Starting from an introductory analysis of the concept of well-being as it has been 
framed by Positive Psychology research, this paper will reflect on the nature and the 
role of serious games as positive technologies. In particular, it will discuss how they 
can support, and train the optimal functioning of both individuals and groups, by con- 
tributing to their well-being. 


2 Fostering Emotional Well-Being: The Hedonic Perspective 


Kahneman, Diener, & Shwarz [9] conceptualized the idea of emotional well-being 
within the hedonic perspective. They in fact defined hedonic psychology as the study 
of "what makes the experience pleasant or unpleasant". Among the different ways to 
evaluate pleasure in human life, a large number of studies have focused on the con- 
cept of subjective well-being (SWB), "a person’s cognitive and affective evaluation of 
his or her life as a whole" [10,11]. At the cognitive level, opinions expressed by indi- 
viduals about their life as a whole, and the level of satisfaction with specific life- 
domains, such as family or work, becomes fundamental. At the emotional level, 
SWB is indeed related to the presence of positive emotional states and the absence of 
negative moods. 

This point is of particular interest to the hedonic perspective. Unlike negative emo- 
tions, that are essential to provide a rapid response to perceived threats, positive emo- 
tions can expand cognitive-behavioral repertoires and help to build resources that 
contribute to future success [12,13]. 


2.1. How Can Technology and Serious Game Foster Hedonic Well-Being? 


The hedonic side of Positive Technology analyzes the ways technologies can be used 
to produce positive emotional states. For example, Riva and colleagues tested the 
potentiality of Virtual Reality (VR) in inducing specific emotional responses, includ- 
ing positive moods [16] and relaxing states [17,18]. More recently, other studies 
explored the potentiality of emerging mobile devices to exploit the potential of posi- 
tive emotions. 
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Serious Games and games in general are strictly connected to positive emotions, 
and to a wide variety of pleasant situational responses that make gameplay the direct 
emotional opposite of depression [19]. 

At first, serious games can evoke a sensorial pleasure throughout graphics, usabil- 
ity, game aesthetic, visual and narrative stimuli. This point has been analyzed by 
emerging trends, such as engineering aesthetics 2.0 [20] and hedonic computing [21], 
whose results will be able to significantly influence game design. 

Secondly, serious games foster an epistemophilic pleasure by bridging curiosity 
with the desire of novelty within a protected environment where individuals can ex- 
perience the complexity of their self, and developing mastery and control. Empow- 
ered by new media affordances and possibilities, serious games can promote a dy- 
namic equilibrium between excitement and security. 

Thirdly, serious games promote the pleasure for victory and, by supporting virtual 
interactions with real people, they nurture a social pleasure, promoting collaborative 
and competitive dynamics, communication and sharing opportunities, even outside 
the context of the game [22]. 

Games have also been traditionally recognized as marked by a cathartic pleasure 
as they represent a relief valve for emotional tensions, anger and aggressiveness. 

Finally, pleasure has a neural counterpart. An interesting example is that of dopa- 
mine, a neurotransmitter that affects the flow of information in the brain and that is 
often involved in pleasant experiences, as well as in different forms of addiction and 
learning. In a classic study made by Koepp and colleagues to monitor the effects of 
video games on brain activity, a significant increase of dopamine (found in a quantity 
comparable only to that determined by taking amphetamines) was measured [23]. 

Good examples of Serious Games explicitly designed to foster positive emotion are 
The Journey to Wild Divine" (http://www.shokos.com/The_Journey_ 
to_Wild_Divine.htm1) and Eye Spy: the Matrix, Wham!, and Grow your Chi}, 
developed in Dr Baldwin's Lab at McGill University (http: //selfesteemgam 
s.mcgill.ca). In The Journey to Wild Divine the integration between usable bio- 
feedback sensors and a computer software allows individuals to enhance their subjec- 
tive wellbeing throughout a 3D graphic adventure. Here, wise mentors teach the skills 
to reduce stress, and increase physical and mental health. 

Eye Spy: the Matrix, Wham!, and Grow your Chi! are indeed projects whose goal is 
to empower people with low self-esteem respectively by working on ignoring rejec- 
tion information, throughout positive conditioning, or by focusing on positive social 
connections [24,25]. 


3 Promoting Psychological Well-Being: The Eudaimonic 
Perspective 


This perspective is associated with the possibility to fully realize human potential 
through the exercise of personal virtues in pursuit of goals that are meaningful to 
the individual and society [4,9]. In this case, happiness no longer coincides with a 
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subjective form of well-being, but with a psychological one. Psychological well-being 
is based on 6 elements [26]: self-acceptance, positive relationships with others, 
autonomy, environmental mastery, purpose in life, and personal growth. An author 
that has fully interpreted the complexity of the eudaimonic perspective is Mihaly 
Csikszentmihalyi who formalized the concept of flow [27,28], a positive, complex 
and highly structured state of deep involvement, absorption, and enjoyment [28]. The 
basic feature of this experience is a dynamic equilibrium perceived between high 
environmental action opportunities (challenges) and adequate personal resources in 
facing them (skills). Additional characteristics are deep concentration, clear rules and 
unambiguous feedback from the task at hand, loss of reflective self-consciousness, 
control of one’s actions and environment, alteration of temporal experience, and in- 
trinsic motivation. 


3.1 How Can Technology and Serious Game Promote Eudaimonic Well- 
Being? 


Scholars in the field of human-computer interaction are starting to recognize and 
address the eudaimonic challenge. For example, Rogers calls for a shift from ‘‘proac- 
tive computing’’ to “‘proactive people,’’ where ‘‘technologies are designed not to do 
things for people but to engage them more actively in what they currently do’’[29]. 

Further, the theory of flow has been extensively used to study user experience with 
Information and Communication Technologies. It is the case of internet [30], virtual 
reality [31,32] social networks [33], video-games [34], and serious games [35]. 

Bergeron [35] defined serious games as interactive computer applications, with or 
without a significant hardware component, that (i) have challenging goals, (ii) are fun 
to play with and/or engaging, (iii) incorporate some concepts of scoring, (iv) impart to 
the user skills, knowledge, or attitude that can be applied in the real world. 

Interestingly, all of these aspects can be easily overlapped to Csikszentmihalyi's 
theory of flow. Games are in fact "flow activities" [27, 28] as they are intrinsically 
able to provide enjoyable experiences [22], creating rules that require the learning of 
skills, defining goals, giving feedback, making control possible, and fostering a sense 
of curiosity and discovery. 

In addition, the intrinsic potential of flow that characterizes serious games can be 
even empowered by (i) identifying an information-rich environment that contains 
functional real world demands; (ii) using the technology to enhance the level of pres- 
ence of subjects in the environment, and (iii) allowing the cultivation, by linking this 
optimal experience to the actual experience of the subject [3]. To achieve the first two 
steps, it is fundamental to look at the following game design elements [36]: 


e Concentration. Serious games should stimulate a mental focus on in-game dy- 
namics, by providing a set of engaging, differentiated and worth-attending stimuli 
that limit the influence of external variables. Along with other aspects, concentra- 
tion can result in hyperlearning processes that consist of the mental ability to to- 
tally focus on the task by using effective strategies aligned with personal traits 
[50]; 
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e Challenge. As noted by Gee [37], who claims that the game experience should be 
"pleasantly frustrating", challenges have to match players’ skills/level and to sup- 
port their improvement throughout the game. During specific stages of the game, 
"Fish tanks" (stripped down versions of the real game, where gameplay mecha- 
nisms are simplified) and "Sand boxes" (versions of the game where there is less 
likelihood for things to go wrong) can support this dynamism; 

e Player Skills. Games must support player skills and mastery throughout game 
usability, and specific support systems and rewards; 

e Control. It is fundamental for players to experience a sense of control over what 
they are doing, as well as over the game interface, and input devices; 

e Clear goals. Games should provide players with specific, measurable, achiev- 
able, responsible and time-bounded goals; 

e Feedback. Players have to be supported by feedback on the progress they are 
making, on their action, and the ongoing situations represented in the virtual en- 
vironment; 

e Immersion. Players should become less aware of their surroundings and emotion- 
ally involved in the game dynamics; 

e Social Interaction. Games should create opportunities for social interaction by 
supporting competition, collaboration, and sharing among players. 


An interesting example of an eudaimonic serious game is Superbetter, developed 
by Jane McGonigal (https://www.superbetter.com/). SuperBetter helps 
people their life goals by working on personal resilience. The application of the 
aforementioned elements supports people being curious, optimistic and motivated and 
promotes high levels of user engagement. 


4 Working on Social Well-Being: The Social Perspective 


Social well-being indicates the extent to which individuals are functioning well in 
their social system and it is defined on five dimensions [39]: 


e Social integration, conceptualized as the evaluation of the quality of personal 
relationships with a community or a society; 

e Social contribution, evidenced by the perception of having something important 
to offer to society and the world at large; 

e Social coherence, determined by the meaning given to the quality, organization, 
and operations that make up the social sphere; 

e Social acceptance, based on the belief that people proactivity and agency can 
foster the development of societies and culture; 

e Social actualization, determined by the evaluation of the potential and the trajec- 
tory of society. 
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4.1. How Can Technology and Serious Game Promote Social Well-Being? 


At this level, the challenge for Positive Technology is concerned with the use of new 
media to support and improve the connectedness between individuals, groups, and 
organizations, and to create a mutual sense of awareness. This is essential to the feel- 
ing that other participants are there, and to create a strong sense of community at a 
distance. 

Short and colleagues [40] introduce the term "social presence" to indicate the de- 
gree of salience of the other person in a mediated environment and the consequent 
salience of their interpersonal interaction. On this point, Riva and colleagues [41] 
argued that an individual is present within a group if he/she is able to put his/her own 
intentions (presence) into practice and to understand the intentions of the other group 
members (social presence). Nowadays, social presence has been empowered by ad- 
vanced ICT systems. All these technologies can promote the development of a peak 
collaborative state experienced by the group as a whole and known as “networked 
flow” [42]. Sawyer [43,44], who referred to this state with the term of "group flow", 
identified several conditions that facilitate its occurrence: the presence of a common 
goal, close listening, complete concentration, control, blending egos, equal participa- 
tion, familiarity, communication and the potential for failure. As noted by Gaggioli 
and colleagues [42], networked flow occurs when high levels of presence and social 
presence are matched with a state of "liminality". In particular, three pre-conditions 
have to be satisfied: 


e group members share common goals and emotional experiences so that individ- 
ual intentionality becomes a we-intention [45] able to inspire and guide the whole 
group; 

e group members experience a state liminality, a state of "being about" that breaks 
the homeostatic equilibrium previously defined; 

e group members identify in the ongoing activity the best affordances to overcome 
the situation of liminality. 


Social presence and networked flow can be fostered by serious games as well. An 
interesting study realized by Cantamesse, Galimberti, & Giacoma [46], for example, 
examined the effect of playing the online game World of Warcraft (WoW), both on 
adolescents’ social interaction and on the competence they developed on it. The in- 
game interactions, and in particular conversational exchanges, turn out to be a col- 
laborative path of the joint definition of identities and social ties, with reflection on 
in-game processes and out-game relationships. Another interesting example is 
Mind the Game™, developed by our research group [47] to enhance the optimal 
functioning of groups. The serious game does not only promote cooperation 
and competitive processes, but also stimulates a proactive co-construction of know- 
ledge that foster the emergence of we intentions, networking opportunities and in- 
group dynamics. 
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5 Conclusion 


In this paper we discussed the role of serious games as positive technologies. Accord- 
ing to Positive Psychology theoretical framework and Positive Technology approach, 
we demonstrated that these applications are able to promote hedonic well-being, eu- 
daimonic well-being and social well-being, 

First of all, serious games can foster positive emotional states by enhancing the dif- 
ferent forms of pleasure they are intrinsically made of. In particular, we discussed the 
importance of sensorial, epistemophilic, social, cathartic and neural pleasure. 

Secondly, serious applications for computer game technologies can be associated 
with flow experiences and, thus, with psychological well-being. Throughout high 
level of presence and flow, technologies can, in fact, promote optimal experiences 
marked by absorption, engagement, and enjoyment. 

Finally, serious games are able to increase connectedness and integration. To 
achieve such a complex goal they have to work on a mutual sense of awareness, as 
well as social presence and situations of liminality. In this way, groups can access 
peak creative states, known as networked flow optimal experiences, that are based on 
shared goals and emotions, collective intentions, and proactive behaviours. 
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Abstract. In this work, we choose Chinese Opera as research material, hoping 
to increase people’s acceptance and intimate to the performance. The theme is 
"Havoc the Dragon Palace", one chapter of the sixteenth century Chinese novel 
“Journey to the West” by Wu Cheng’en. We developed the rendering technique 
and named “Live Video Mapping”. It focuses on both the movement of human 
detection and the interaction with background video real-time. The virtual im- 
ages on the stage not only generate good of view but also make audience expe- 
rience the illusion of space in which the space is expanding and enhancing. 
Taking into account the above factors, this study explore the possibility of in- 
teractive video mapping, as well as understanding and increasing the affinity of 
Chinese Opera to promote the value of the Chinese Opera. 


Keywords: Journey to the West, Chinese Opera, real-time interactive expe- 
rience, live video mapping. 


1 Introduction 


The new experimental interactive art has been integrated with various fields of arts 
such as digital art, sound, lighting, photo, game, virtual reality. Work of art has re- 
mained in molting from the fact that obtains the life force while vibration. Since the 
digital age beginning, the heart of the media area has broken away from the fixed, 
deny the remained, influx the new in the works. The principle of new art creation is 
the participation of the audience as this work is completed through the audiences” 
active participation. It is also the new standard of art and fundamental principle of the 
structure of the media artwork [1]. If you briefly analysis form from the viewpoint of 
visual media, the trend from static media through dynamic media to experiential me- 
dia. From that standpoint, there are various types of art in video mapping such as 
media facade, stage of performance art, object mapping in the exhibition and others. 
Therefore, it is considered to be a proper subject for this paper. With the spreading 
and effective using of video mapping and the video media, we believe it is necessary 
to figure out the current condition of the usage in the area of performing arts. 

In this work, we choose the Chinese Opera as research material to recommend an 
experience-based culture content by “Live video mapping”. Chinese Opera is one of 
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the best ethnic arts to express and represent the symbolism of Chinese characteristics, 
ethnic, and personality. This traditional performance has gone through many changes 
such as the social changing. Globalization and the modernization of the media area 
become the biggest challenge for Chinese Opera as well. The culture demand expands 
Chinese Opera market that is considerably shrinks. When foreigners and even some 
Chinese new generations watch the Chinese opera for the first time, they may feel 
strange and distant, it is hard to recognize the content of the play and the actor’s line. 
This fact makes people difficult to get close to Chinese Opera. Monkey King (Sun 
Wu Kong in Chinese) is famous for the Chinese novel “Journey to the West”. There 
are several plays derived from the story, we choose one chapter which is called “Ha- 
voc the Dragon Palace” to help people experience it. The purpose of this work is to 
guide the people act performance and make them feel themselves became one part of 
the animation, by using the live video mapping system to increase the affinity of Chi- 
nese Opera. 

This paper is organized as follows: Chapter 2 explains the system of “Live Video 
Mapping”, and shows some examples of similar contents. Chapter 3 describes the 
Chinese Opera and its limitations, and discusses the plays of the Monkey King. Chap- 
ter 4 presents the interactive Chinese Opera contents “Havoc the Dragon Palace”. 
Defining the concept of content based on the scenario, making the visual contents, 
setting the location of the display to implement the final work. Finally, the conclu- 
sions are presented in chapter 5. 


2 Live Video Mapping and Precedent 


2.1 Live Video Mapping 


Projection mapping, also known as video mapping and spatial augmented reality, is a 
projection technology used to turn objects, often irregularly shaped, into a display 
surface for video projection[2]. It is using the optical illusion, projection an overlay 
videos to provide high immersive experience through the expansion of a real world 
space. The representation of the contents is diversified with the development of re- 
lated technologies. Nowadays, allowing projection mapping to a moving object such 
as calibrate the distortion, real-tome tracking object has beyond the limits of the exis- 
tences. So we developed the technology and used in the performance or exhibition to 
projection on the body of human. The live video mapping provides the interface be- 
tween audiences and interactive contents, to make a story and communication. In 
order to make Chinese Opera content experience-based, we use the human body or 
performer’s costumes as screen to make a creative stage [3]. The live video mapping 
is named by us, so there is no dictionary definition about it. This work needs to make 
visual contents and programming. To preset the guide animation for expecting 
behavior of the audience, within a certain range, free actions of the audience are re- 
flected in the work. The audience comes to appreciate the art and has an extraordinary 
experience. 


180 X.-D. Huang et al. 


2.2 System 


In this work, we uses PC, projector, Kinect and speakers (Fig. 1). Kinect is a line of 
motion sensing input device, which enables users to control and to make interactions. 
The device features an RGB camera, depth sensor and multi-array microphone run- 
ning proprietary software, which provides full-body 3D motion capture, facial recog- 
nition and voice recognition capabilities [4]. The depth sensor is consists of an infra- 
red laser projector combined with a monochrome CMOS sensor, which captures vid- 
eo data in 3D under any ambient light conditions [5]. The device can determine the 
value of depth and receive the 3-dimentional coordinates X, Y, Z. The default RGB 
video stream uses 8-bit VGA resolution (640 x 480 pixels), and output 30 frames per 
second. The kinect not only can control the Xbox game, but also possible via connect 
with the PC which has USB interface. There are diverse of methods to develop the 
interactive contents by kinect. In this work, I use the Simple-OpenNI library for Pro 
cessing. Processing is an open source programming language which has promoted 
software literacy within the visual arts and visual literacy within technology. Simple- 
OpenNI uses the Skeleton API to track the joints and enable auto-calibration. Besides, 
I use a 2d physics library for simulating rigid bodies called Pbox2d. I program the 
particle system interact with people. For the background visual contents I used Auto- 
desk Maya (3D animation, modeling, simulation, rendering software); and for the 
post-production process I used Adobe After Effects (Motion graphics, visual effects 
and compositing software). The tracking data in the Processing were transferred to 
Resolume arena by Syphon library, and the visual sources produced in After Effects 
are directly imported into the Resolume (Fig. 2). Each of the images projected on the 
subject can be controlled manually or automatically. Detailed account of the process 
is given in chapter 4. 


Worery = girple-cperMI 
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Fig. 1. Hardware system Fig. 2. Software system 


2.3 Precedent of Similar Contents 


Since using Kinect, the media artworks become rich. Especially in the performing 
arts, it can project performer’s body or costumes as a screen, it is possible to reduce 
the constraints of the representation and to improve the effect force. The shape of the 
screen is unfixed, sometimes cause optical illusion. The following are some precedent 
of similar contents. 
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“Kinect Illusion” (Fig. 3) uses the motion tracking and the functions of RGB cam- 
era, depth camera. It is a multimedia music work that combined the elements of sound 
and video with the movement of the actors. The kinect in front of the screen is 
connected with a Mac Pro, and the kinect behind the screen is connected with a 
MacBook Pro; All videos are produced in Quartz Composer. Two performers perform 
interactive dance. 


Fig. 3. Kinect Illusion [6] 


“Puppet Parade” (Fig. 4) is an interactive installation that allows children to act as 
puppeteer and use their arm to simulate the larger-than-normal sized puppet creatures 
projected on the wall in front of them. Children can also step in to the environment 
and interact with the puppets directly, for example by petting them or creating food 
for them to eat. This dual interactive setup allows children to perform alongside the 
puppets, blurring the line between the “audience” and the puppeteers and creating an 
endlessly playful dialogue between the children in the space and the children manipu- 
late the puppet creatures. 


Fig. 4. Puppet Parade [7] 


What interest people most in the above performances are that they both contained 
the interactive-based and communicative-based contents (real-time people track). This 
form of performance presents a new way of video mapping where audience can par- 
ticipate into the show, the visual source can be reused, and audience can easily accept 
the performance. In the same space and the same background video, the audience can 
experience the real-time image processing interaction. The performers can experience 
the real-time image processing interaction to deploy scenario at the same time. 
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3 Chinese Opera and Plays of the Monkey King 


3.1 Chinese Opera 


According to the statistics of the China National Academy of Arts in 1986, there are 
374 kinds of traditional operas performing in China. The most popular one is “Peking 
Opera” (Fig. 4), and this is also that I will present in this paper [8]. Peking Opera is a 
form of Chinese traditional opera, which is also called Chinese Opera in western 
countries, for the better understanding, we use Chinese Opera instead in the following 
paper. Chinese Opera are famous in Beijing and Tianjin in the north of China, and 
Shanghai in the south [9]. In the past, it is also called Jingxi, Pingxi or Guoju, depend- 
ing on the original of the region. Chinese Opera was born when “Four Great Anhui 
Troupes’(Sanqing Troupe, Sixi Troupe, Chuntai Troupe, Hechun Troupe) brought 
Anhui opera, or what is now called Huiju, in 1790 to Beijing, for the celebration of 
the eightieth birthday of the Qianlong Emperor [10] on 25 September[11]. Therefore, 
Chinese opera is generally believed to originate from southern Anhui and eastern 
Hubei, and to be fully formed by 1845[12]. The main body of the melodies originated 
from Xipi and Erhuang. The melodies that accompany each play were also simplified, 
and played with more different traditional instruments than in earlier forms. 


Fig. 5. The Chinese Opera [13] 


Chinese Opera is a traditional theatre which combines music, vocal performance, 
mime, dance, and acrobatics [14]. Although it is a product of traditional culture, it is 
the fact that there exists huge distance between this form of art and the audiences of 
today. The Chinese government has been attaching great efforts to advocate, protect 
and succeed the traditional culture. For example, they have participated in special Chi- 
nese Opera performances many times, and organized the Chinese Opera groups to 
perform abroad. It is truth that Chinese Opera plays an important role in the interna- 
tional cultural exchange. The country is making considerable efforts to train the actors 
and encourage the growth of successions [10]. Qi Xiaoyun one of the famous Jing 
(One of the role) performers, acted the “Othello” by William Shakespeare in Chinese 
Opera in 1982. As the first woman performed Chinese Opera in English, her perfor- 
mance become a hot issue at that time. Besides, she also performed an Ancient Greece 
Tragedy “Bakchai’, “ChiSangZhen (Red Mulberry Town)”, “ZhaMeiAn (Judge Bao 
and the Qin Xiang)” and “ChuSanHai (In addition to tree evils)” in English [10]. Chi- 
nese Opera is a_ substantial channel for social education and entertainment, 
so it is one of the tools for increasing awareness of the Chinese culture. It also plays an 
important role in the development of national economy and civilization. 


An Experience-Based Chinese Opera Using Live Video Mapping 183 


3.2. The Limitation of Chinese Opera 


Chinese Opera presents dramatic plays and figures by infusing artistic methods: sing- 
ing, dialogue, dancing and martial art. Singing is utilized to intensify the appeal of the 
art by all kinds of tones. China has different dialects in each region, it is hard to un- 
derstand without subtitles. Dialogue is the complement of singing which is full of 
musical and rhythm sensation. Dancing refers to the body movements requiring high 
performing skills. For example, circling with whip in hand, means riding a horse; 
simply walking around means a long journey; Waving a cloud patterned flag means 
the character is in the wind or under the sea. Martial art is the combination and trans- 
formation of traditional Chinese with combating exercises with dances [15]. 

There are four main roles in Chinese Opera: Sheng, Dan, Jing, Chou. Sheng is the 
main role. Dan refers female role, there are four famous roles of Dan, such as Mei 
Lanfang, Cheng Yanqiu, Shang Xiaoyun, and Xun Huisheng. Jing is a painted face 
male role and Chou is a male clown role [16]. Although four roles are sub-divided, all 
of them should perform professionally. The acting skills are intimately connected 
with their costume, facial painting and props. Costumes and the facial painting (Lian- 
pu) take on added importance. Costumes help to clarify the rank of the characters in 
the Chinese Opera. Lianpu is formed through dramatic artists” long-term practice and 
their awareness and judgment of the roles in the plays. The colors used in costumes 
and Lianpu are the Chinese traditional five elements colors, and the patterns are also 
the traditional ones. Besides, many contents contain in the Chinese Opera. In such a 
volume as this, only a bare general sketch can be given of the Chinese Opera. If 
people want to appreciate or even fall into love it, they need to understand about the 
story, subtitles, roles, the meaning of their gestures and more importantly, the know- 
ledge about traditional Chinese culture. This is the reason why more and more young 
people don’t like Chinese Opera. There is little person knows well about the Chinese 
Opera even doesn’t like to watch. It is a limitation of them to have opportunities to get 
close to the performance. They think the traditional opera is reflecting the life of an- 
cient that is far from them, so that Chinese Opera is considered difficult to compre- 
hend [10]. 


3.3. The Plays of Monky King 


The titles of the Chinese Opera are more than 5800 kinds [17]. There are more than 
300 episodes performed many times and got a high volume of audiences. For exam- 
ple, “Jiang Xiang He”, “Yu Zhou Feng”, “Zhui Han Xin’, “Ba Wang Bie Ji’, “Gu 
Cheng Hui” and so on [18]. The story in Chinese Opera are normally from the classic 
novels. There are 36 episodes derived from the famous Chinese classic novel “Jour- 
ney to the west” in Ming Dynasty. Sun Wukong also known as the Monkey King that 
is a main character around the world. Chinese like him due to his manhood and 
bravery. In the Chinese Opera, the performer make-up as a monkey and dazzles the 
audiences with agile movements. It is the glory days of the Monkey King’s perfor- 
mance during 1937 to 1942, and people call it “play of the Monkey King” instead of 
“play of Journey to the west”. 
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With the development of computer technology and means of communication, as 
well as improvement of the economic, internationalization and globalization are acce- 
lerating further from the end of the 20" century. Especially China has abundant cul- 
ture resources and wide market, the speed to the globalization is very fast. Chinese 
also try hard to the revitalize of the culture contents. In the following part of the pa- 
per, I will give an example of the play: The Monkey King’s “Havoc in Heaven” in 
Fig. 6 is a performance which using the projection mapping technique. The original 
play of the Monkey King is needed audience to imagine the story and the space of 
heaven, but the performers here use brilliant graphics and varieties of spaces to give 
the audiences. In addition, they make virtual characters to fight with the Monkey 
King, which enhances the braveness of the main character. The content is very suita- 
ble for the digital media performance. No translation is needed, and people who don’t 
have any background in Chinese opera can also easily experience the performance. 


Fig. 6. Digital media performance “Havoc in Heaven” [19] 


4 An Experience-Based Chinese Opera “Havoc the Dragon 
Palace” 


4.1. The Production of the “Havoc the Dragon Palace” 


The story of “Havoc the Dragon Palace” is the first havoc of the Monkey King. There 
are three times havocs in the novel, and the location is mainly under the sea, so it is 
different with the “Havoc in Heaven”. After Monkey King has finished the magical 
and martial skills learning and returned to the Huaguo Mountain where he gathered 
fellows and proclaimed himself as the king. He visited the Dragon Palace under the 
East Sea to ask for a weapon from Dragon king. He inadvertently discovered the 
“Ocean-Pacifying Needle” (Golden Cudgel), a treasure of the Dragon Palace. He 
asked the Dragon King to present it to him as a gift, but the Dragon King refused. So 
Monkey King Havoc the Dragon Palace. In the end, the Monkey King got the treasure 
and he desired and returned to the Huaguo Mountain. For using in the experience- 
based Chinese Opera, there is a need adaption of the story in Table 1. 
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Table 1. Story table 


Introduction 


Monkey King visited the Dragon Palace and asked for a weapon from Dragon king 


Development 


Monkey King inadvertently discovered the “Ocean-Pacifying Needle”. 


Turn 


The Dragon King refused to gift it. So Monkey King Havoc the Dragon Palace. 


Conclusion 


The Monkey King got the treasure and went a ceremony of victory. 


There are three main characters: Monkey King, Dragon King and Conductor in this 
work (Fig. 7). The Monkey King can do martial arts and know 72 transformations, he 
is also the symbol of passion, freedom, braveness, optimism and luckiness. He takes 
pheasant tail crown on head, and his face is painted in red and white. The color of his 
eyes is yellow because he ate a bolus before. He performs Sheng and his costumes use 
warm colors which combines the red and the yellow, the purpose of giving audiences 
an impression of justice. The Dragon King is a dignity but brutal, hypocritical and 
stubborn character. He act as Jing and use the black and white as the main color of his 
Lianpu, two arms and lower body of his costume are decorated using smoke tails of 
dragon. Conductor is a 2D shadow with golden line who guides audiences what to do 
next. He induces the gestures to people when needed and disappears after work. If the 
performer deletes the animation part of the Conductor, the content would be used in a 
real performance of the Chinese Opera. 


(c) 
Fig. 7. (a) The 3D Monkey King. (b) The 3D Dragon King. (c) The 2D Conductor 


Chinese Shadow play is known as a similar traditional performance with Chinese 
Opera, except the fact that they use the puppets to play. It is possible to show anytime 
and anywhere only if the environment is dark. Audiences can experience the puppetry 
freely after the show. I believe the experience-based content is designed for remova- 
bility and convenience, so the audiences go approach easily. It seems to watch anima- 
tion during the experience time and the visual contents are as below (Fig. 8). 
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000 Pre-Stage 


001 Conductor and human tracking 


The shows are beginning when kinect tracks 
the human. 


The curtain rises and conductor appears and 
the Monkey King mask on the human. 


002 Track the people 


003 Dragon Palace under the East 


Sea 


The monkey King in the Huaguo Mountain, 
starts the fantastic adventure. 


Seawater is coming up and the Dragon Palace 
hppears in the front. 


(a) 


Put on the weapons 


004 


005 Ocean-Pacifying Needle 


Monkey King tries some of the weapons by 
the gesture of touching the weapon. 


Discover the “Ocean-Pacifying Needle (It 
will become smaller when users come 
close.) 


(b) 


Fig. 8. (a) Introduction. (b) Development. (c) Turn. (d) Conclusion 
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006 Havoc the Dragon Palace 007 Destroy the Dragon Palace 


Brandish the Golden Cudgel and havoc | The Dragon Palace is destroyed and the 
the Dragon Palace. splinters are dropped. 


(c) 
008 Ceremony of victory 009 The end 


The fire monkey covers the whole screen and | The end. 
roars for the victory. 


(d) 
Fig. 8. (Continued) 


4.2 The Realization 


There are 1 PC, 1 projector, 1 kinect and speakers used in the exhibition. “Havoc the 
Dragon Palace” is divided two parts, the visual production and image processing. The 
visual production part includes background images and two characters of the Dragon 
King and Conductor. Nobody likes to be a supporting role and wants to be a loser, so 
the Dragon King is produced in the background. The Monkey King masking on the 
audience is included in the image processing. For normal behaviors of modeling data, 
Monkey King just make upper body, the other part of body show real people which 
casts in the RGB camera. Actually, Monkey King wears dragon robe in the Chinese 
Opera, I design armor for the fantastic and fashionable. Besides, Monkey King choos- 
es weapons in the Dragon Palace by the gesture of touch the weapon, the “Ocean- 
Pacifying Needle” becomes smaller when users come close, the particle effects come 
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out when brandish the Golden Cudgel, Dragon Palace is destroyed and the splinters 
are dropped on the Monkey King’s body, all of them are produced in image 
processing. The Monkey King transforms to various things and himself, it is logical 
that several audiences experience at the same time. The following images in Fig. 9 are 
photos of real shot. 


Fig. 9. Real shot of the exhibition 


5 Conclusion 


This paper has interest that communication between audiences and the cultural assets, 
increasing the awareness of the intimate but unfamiliar Chinese Opera to people. The 
result of the experience is that the interactive video mapping system demarcates the 
subject and the object of arts to suggest the future of the digital media contents. This 
content make a strong satisfaction of people to appreciate and experience at the same 
time, visual contents also are focused on storytelling, not the list of graphic, it abso- 
lutely differentiated from the one off the experiential contents. 

The media art content can easily depend on the technology so that ignore the artis- 
tic creativity. It is a suggestion that select traditional materials which have the value 
of the story to overcome the limitation of areas and ages to create successful contents. 
The method of developing a better content is to find cherished stories inside the cul- 
ture then reconstruct the sources combination with the new technology. Also an expe- 
rienced-based content must be delivered via experience such as a real-time interactive 
projection mapping. Many culture contents end in failure because their standard of 
culture and specialized knowledge are superficial. 

The experience-based Chinese Opera explored the possibility of one of the Chinese 
traditional cultures to promote the value of the Chinese Opera to help for understanding 
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and increasing the affinity. After experience, I hope to have a chance to perform with a 
real performer and audiences will get familiar with the Chinese Opera and get followers 
in the worldwide. Besides, sincerely hope this work is able to recognize by the world as 
an artistry culture contents. Furthermore, the communication between culture assets and 
the public by exploring the possibility of the sustainability that generate the potential of 
this content, which expend the scope of the study. 


Acknowledgement. This research was supported by Basic Science Research Program 
through the National Research Foundation of Korea (NRF) funded by the Ministry of 
Education, Science and Technology (2010-0023438). 


References 
1. Eungyung, O.: New Media Art. Yonsei University Publishing House, Soul (2008) 
2. Wikipedia, http: //en.wikipedia.org/wiki/Projection_mapping 
3. Kim, M.: A study on space and object expression using projection mapping, Riss Trans. 
Master these, Video Contents, p. 5 (2011) 
4. Wikipedia, http: //en.wikipedia.org/wiki/Kinect 
5. The seattlepi, http://blog.seattlepi.com/digitaljoystick/2009/06/ 
01/e3-2009-microsoft-at-e3-several-metric-tons-of-press- 
releaseapalloza/ 
6. Yoon, K.: Research on interactive multimedia productions with Kinect, Riss Trans. Master 
these, Multimedia Design. p. 32 (2012) 
7. Interactive installations, environments and R&D, 
http: //design-io.com/projects/PuppetParadeCinekid/ 
8. Zhang, G.: The contemporary Chinese Opera. Theatre in China, Beijing (2010) 
9. Wichmann, E.: Tradition and Innovation in Contemporary Beijing Opera Performance. 
The MIT Press, Cambridge (1990) 
10. Xu, C.: Peking Opera. Cambridge University Press, Cambridge (2012) 
11. Elliott, M.C.: Emperor Qianlong: Son of Heaven, Man of the World. Longman Publishing 
Group, Beijing (2009) 
12. Goldstein, J.S.: International Relations. Longman Publishing Group, Beijing (2003) 
13. Ni Picture of China, http: //www.nipic.com 
14. Wikipedia, http://en.wikipedia.org/wiki/Peking_opera#cite_note- 
13 
15. Travel China Guide, http://www.travelchinaguide.com/intro/arts/ 
beijing_opera/ 
16. Hu, Q.: Encyclopedia of China. Encyclopedia of China Publishing House, Beijing (1993) 
17. The art of Beijing Opera, http: //www.jingju.com/zhishi/index.html 
18. Zhang, X., Sheng, X.: The Art of Beijing Opera facial makeup. World Publishing Coop- 
eration, Beijing (2002) 
19. Vimeo, https: //vimeo.com/43467406 


Serious Games: 
Customizing the Audio-Visual Interface 


Bill Kapralos, Robert Shewaga, and Gary Ng 


Faculty of Business and Information Technology, 
University of Ontario Institute of Technology, 
Oshawa, Ontario, Canada L1H 7K4 
bill.kapralos@uoit.ca 


Abstract. Serious games are gaining in popularity within a wide range 
of educational and training applications given their ability to engage 
and motivate learners in the educational process. Recent hardware and 
computational advancements are providing developers the opportunity 
to develop applications that employ a high level of fidelity (realism) 
and novel interaction techniques. However, despite these great advances 
in hardware and computational power, real-time high fidelity rendering 
of complex virtual environments (found in many serious games) across 
all modalities is still not feasible. Perceptual-based rendering exploits 
various aspects of the multi-modal perceptual system to reduce com- 
putational requirements without any resulting perceptual effects on the 
resulting scene. A series of human-based experiments demonstrated a 
potentially strong effect of sound on visual fidelity perception, and task 
performance. However, the resulting effects were subjective whereby the 
influence of sound was dependent on various individual factors including 
musical listening preferences. This suggests the importance of customiz- 
ing (individualizing) a serious game’s virtual environment with respect 
to audio-visual fidelity, background sounds, etc. In this paper details re- 
garding this series of audio-visual experiments will be provided followed 
by a description of current work that is examining the customization of 
a serious game’s virtual environment by each user through the use of a 
game-based calibration method. 


Keywords: Serious games, virtual simulation, audio-visual interaction, 
audio-visual fidelity, calibration. 


1 Introduction 


The use of serious games within a wide range of educational and training ap- 
plications, from military, health professions education, patient education, and 
business/corporate, amongst others, is becoming widespread particularly given 
the ubiquity of video game play by the current tech-savvy generation of learners. 
Recent hardware and computational advancements are providing designers and 
developers of serious games the opportunity to develop applications that employ 
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a high level of fidelity/realism and novel interaction techniques using off-the- 
shelf consumer level hardware and devices. Devices such as the Microsoft Kinect 
motion sensing vision-based sensor allows users to interact with their application 
using a natural user interface that employs gestures thus eliminating the game 
controller and the typically non-natural and potentially limiting interaction it 
affords. For example, using the Kinect within a virtual operating room, surgery 
trainees are able to perform their required tasks in a more intuitive manner that 
is better representative of the real world (see [I}). 

With respect to a simulation (including serious games), fidelity denotes the 
extent to which the appearance and/or the behavior of the simulation matches 
the appearance and behavior of the real system [2]. Despite the great computing 
hardware and computational advances we have experienced, real-time high fi- 
delity rendering of complex environments (found in many serious games) across 
all modalities is still not feasible [3]. Designers and developers of serious games, 
and virtual simulations in general, typically strive for high fidelity environments, 
particularly with respect to the visual (graphical) scene. However, evidence sug- 
gests high fidelity simulation does not always lead to greater learning [4]), and 
striving for high fidelity can burden our computational resources (particularly 
when the simulation is intended to be used on portable computing devices), in- 
crease the probability of lag and subsequent discomfort and simulator sickness 
[5], and lead to increased development costs. Previous work has examined the 
perceptual aspects of multi-modal effects (including audio-visual), and numerous 
studies have demonstrated that multi-modal effects can be considerable, to the 
extent that large amounts of detail of one sense may be ignored in the presence 
of other sensory inputs. Perceptual-based rendering, whereby the rendering pa- 
rameters are adjusted based on the perceptual system (typically vision), is often 
employed to limit computational processing. For example, it has been shown 
that sound can potentially attract part of the user’s attention away from the 
visual stimuli and lead to a reduced cognitive processing of the visual cues [6]. 
Therefore, if the enhancement of visuals within a virtual environment is eco- 
nomically or technically limited, one may consider increasing the quality of the 
audio channels instead [7]. 

Motivated by these studies and the general lack of emphasis on audition in vir- 
tual environments and games (where historically the emphasis has been placed 
on the visual scene [8]), we have begun investigating multi-modal (audio-visual) 
interactions within virtual environments (serious games, virtual simulations, and 
games). So far, a series of experiments that examined the direct effect of sound on 
engagement, the perception of visual fidelity (the degree to which visual features 
in the virtual environment conform to visual features in the real environment 
[9]), and task performance (the time required to complete a task within a virtual 
environment), of both static and dynamic 3D rendered (virtual) scenes in both 
stereoscopic 3D (S3D) and non-S3D viewing were conducted. Although this se- 
ries of experiments have shown a strong influence of sound on visual fidelity, 
engagement, and task performance, results have also shown strong subjective 
effects whereby the influence of sound is dependent on various individual factors 
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including musical listening preferences. This suggests the importance of individ- 
ualizing (customizing) audio-visual fidelity, and the sounds employed within a 
virtual environment to take advantage of perceptual-based rendering. Building 
upon the results of these experiments, we are examining the customization of the 
serious game’s virtual environment to each user via a novel game-based calibra- 
tion technique that will allow users to customize the virtual environment before 
they begin using the serious game. The calibration process will be used to tailor 
the settings of various simulation parameters including S3D settings (interax- 
ial settings), audio and visual fidelity, background sounds/sound effects, spatial 
sound settings (choosing head-related transfer functions from a pre-defined set, 
etc.), amongst others, to each user’s preferences. Such customization provides 
the opportunity to increase user engagement and ultimately learning. 


1.1 Paper Organization 


The remainder of this paper is organized as follows. In Section 2 a brief discussion 
of previous work (with an emphasis on the series of our own previously conducted 
experiments), is provided. Details regarding the calibration game are provided in 
Section 3 while a discussion, concluding remarks, and plans for future research 
are provided in Section 4. 


2 Background 


Various studies have examined the perceptual aspects of audio-visual cue inter- 
action, and it has been shown that the perception of visual fidelity can affect 
the perception of sound quality and vice versa . For example, Mastoropoulou 
et al. [6] examined the influence of sound effects on the perception of motion 
smoothness within an animation and more specifically, on the perception of 
frame-rate, and infer that sound can attract part of the viewer’s attention away 
from any visual defects inherent in low frame-rates [6]. Similarly, Hulusic et 
al. showed that sound effects allowed slow animations to be perceived as 
smoother than fast animations and that the addition of footstep sound effects 
to walking (visual) animations increased the animation smoothness perception. 
Bonneel et al. [[2] examined the influence of the level of detail of auditory and 
visual stimuli on the perception of audio-visual material rendering quality and 
observed that the visual level of detail was perceived to be higher as the au- 
ditory level of detail was increased. Although there are various other relevant 
studies, for the remainder of this section, emphasis will be placed on our own 
previous work that has examined visual fidelity perception in the presence of var- 
ious auditory conditions. Greater details regarding the influence of sound over 
visual rendering and task performance is provided by Hulusic et al. [3] while an 
overview of “crossmodal influences on visual perception” is provided by Shams 
and Kim [I3]. 

Our studies began with simple static environments that consisted of a single 
2D image of a surgeon’s head (a rendered 3D model). In the first study, visual 
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fidelity was defined with respect to the 3D model’s polygon count [14] while 
in the second study, polygon count was kept constant and visual fidelity was 
defined with respect to the 3D model’s texture resolution [15]. A sample of the 
visual stimuli is provided in Fig. [I]where three renderings of the surgeon’s head, 
each one with a constant polygon count but differing with respect to texture res- 
olution, are shown. In both studies, participants were presented with the static 
visual (a total of six visuals were considered, each differing with respect to poly- 
gon count or texture resolution depending on the experiment), in conjunction 
with one of four auditory conditions: i) no sound at all (silence), ii) white noise, 
iii) classical music (Mozart), and iv) heavy metal music (Megadeth). For each 
of the visuals, their task was to rate its fidelity on a scale from 1 to 7. With 
respect to polygon count, visual fidelity perception increased in the presence of 
classical music, particularly when considering images corresponding to higher 
polygon count. When considering texture resolution, sound consisting of white 
noise had very specific and detrimental effects on the perception of the quality 
of high-resolution images (i.e., the perception of visual quality of high fidelity 
visuals decreased in the presence of white noise). In contrast to the study that 
considered polygon count, sound consisting of music (classical or heavy metal) 
did not have any effect on the perception of visual quality when visual quality 
was defined with respect to texture resolution. 


Fig. 1. Sample of the visual stimuli used in a previous experiment that examined the 
effect of sound visual fidelity perception [15]. Here, each model of the surgeon’s head 
contained the same polygon count but the texture resolution differed. 


These two experiments were repeated but now the visuals were presented in 
stereoscopic 3D [16]. When visual fidelity was defined with respect to polygon 
count, “classical music” led to an increase in visual fidelity perception while 
“white noise” had an attenuating effect on the perception of visual fidelity. 
However, both of these effects were evident for only the visual models whose 
polygon count was greater than 678 (i.e., auditory condition had no effect on 
the two smallest polygon count models), indicating that there is a polygon count 
threshold after which the visual distinction is not great enough to be negatively 
influenced by white noise. With visual fidelity defined with respect to texture 
resolution, both “classical music” and “heavy metal music” led to an increase in 
visual fidelity perception while “white noise” led to a decrease in visual fidelity 
perception. 
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Although the results of these four studies show that sound can affect our per- 
ception of visual fidelity, it is not known if this influence of sound is affected by 
the introduction of contextually specific sounds. The auditory conditions con- 
sidered in our previous studies have been completely disjoint from the visuals. 
That is, there was no (direct) relationship between the auditory and visual cues 
(they were non-contextual). Two experiments were thus conducted to exam- 
ine visual fidelity perception, defined with respect to texture resolution, in the 
presence of contextual sounds, that is, sounds that had a causal relationship to 
the visual cues [16] [17]. The visual stimuli consisted of six images of a surgeon 
holding a surgical drill, against a black background (similar to the visuals em- 
ployed in the previous experiment shown in Fig. [I]but with the addition of the 
surgeon’s upper body). The polygon count of the 3D model was kept constant 
but as with the previous experiment, the texture resolution of the surgeon and 
the drill was varied. The auditory conditions included the four non-contextual 
auditory conditions considered in the previous experiments in addition to the 
following three contextual sounds: i) operating room ambiance which included 
machines beeping, doctors and nurses talking, ii) drill sound, and iii) hospital 
operating room ambiance coupled (mixed) with the drill sound. The visuals re- 
mained static in both experiments but in the second experiment, stereoscopic 3D 
viewing was employed. With non-S3D viewing, results suggest that contextual 
auditory cues increase the perception of visual fidelity while non-contextual cues 
in the form of white noise leads to a decrease in visual fidelity perception par- 
ticularly when considering the lower fidelity visuals [17]. However, the increase 
in visual fidelity perception was observed for only two of the three contextual 
auditory conditions and more specifically, for the operating room ambiance, and 
operating room ambiance + drill auditory conditions and not for the drill au- 
ditory condition despite the fact that the surgeon within the visual scene was 
holding a surgical drill. With respect to S3D viewing, “white noise” led to a de- 
crease in visual fidelity perception across all of the visuals considered. However, 
none of the auditory conditions led to a statistically significant increase in visual 
fidelity perception [16]. That being said, none of the participants were surgeons 
or medical practitioners and may not have been familiar with an operating room 
and the sounds contained within an operating room. The notion of contextual 
auditory cues may also be subjective and may depend on prior experience and 
musical listening preferences. 

The experiments described so far considered static visual environments where 
the visual scene (the 3D models presented to the participants), remained static. 
Two additional experiments were conducted to examine the effect of sound on 
visual fidelity perception, and task performance in dynamic virtual environments 
were the participants had to interact with the environment while completing a 
simple task. In both experiments, participants were presented with a virtual 
operating room and their task was to navigate through the virtual operating 
room from their starting position to a point in the room which contained a 
tray with surgical instruments (see Fig. [2). Once they reached the tray, they 
were required to pick up a surgical drill (they had to navigate around a bed 
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and a non-player character nurse to reach the tray that contained the surgical 
instruments). In one of the experiments, visual fidelity was defined with respect 
to the level of (consistent) blurring of the entire screen (level of blurring of the 
scene was used to approximate varying texture resolution), and the auditory cues 
consisted of the three contextual cues considered in the previous experiments in 
addition to white-noise and no sound. Sound (contextual and non-contextual), 
did not influence the perception of visual fidelity irrespective of the level of 
blurring. However, sound did impact task performance (defined as the time to 
required to complete the task). More specifically, white noise led to a large 
decrease in performance (increase in task completion time) while contextual 
sound improved performance (decrease in task performance time), across all 
levels of visual fidelity considered. In the second experiment [18], visual cues 
consisted of: i) original (no effect), ii) cel-shading with three levels (i.e., color is 
divided into three discrete levels), and iii) cel-shading with six levels (i.e., color is 
divided into six discrete levels). The contextual auditory conditions consisted of: 
i) no sound (visuals only), ii) monaural (non-spatial) surgical drill sound, and iii) 
spatialized surgical drill sound. In contrast to the last study, in this experiment, 
spatial sound (acoustical occlusion and reverberation) was considered. Contrary 
to our previous work, the presence of sound (spatial and non-spatial) did not 
have any effect on either visual fidelity perception or task completion time. That 
being said, only six participants took part in the experiment (in contrast to 18 
for each of our previous experiments), thus the results are preliminary. 


Starting 
Position 


Fig. 2. View of the virtual operating room environment used in two previous experi- 
ments {18} [19]. The task of each participant was to navigate the environment from the 
starting position to the position of the surgical drill and then “choose” the drill. 


2.1 Summary of Our Experimental Results 


A total of eight experiments were conducted that examined the effect of sound 
on visual fidelity perception under a variety of conditions including static and 
dynamic environments, and stereoscopic 3D viewing. Results varied significantly 
across each of the experiments making it difficult to reach any firm consensus. 
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However, it is clear that white noise generally led to a decrease in the perception 
of visual fidelity and task performance, classical music led to an increase in visual 
fidelity perception (the majority of the time), and that the influence of sound 
on visual fidelity perception is very subjective. The visuals and many of the 
sounds considered in these experiments were medical in nature (e.g., surgeon, 
operating room, operating room ambiance sounds, drill sounds), yet many of the 
participants were students and although some were enrolled in Health Sciences- 
related programs, they had limited (if any), operating room exposure. 

We hypothesize that the variation seen across the results of these experiments 
is due to subjective factors. Prior to the start of each experiment, participants 
were asked to complete a brief questionnaire regarding their video game play 
habits and game and musical preferences. A detailed analysis of the question- 
naire that will examine whether any correlations exist between game/musical 
preferences and the experimental results is currently underway to confirm our 
hypothesis. However, informally, there does appear to be a relationship between 
musical genre preference and the influence of music on visual fidelity perception. 

The variation observed across the results of all experiments and the poten- 
tial consequences this variation may have on perceptual-based rendering and 
ultimately learning when considering serious games, motivated our work in the 
customization of audio-visual fidelity through a user-calibration method. This 
involves the use of a brief questionnaire that users complete prior to beginning 
the serious game followed by an interactive “calibration game” whereby the op- 
timal audio-visual fidelity settings are determined dynamically by the player in 
the process of playing a game. How the questionnaire responses will be used will 
depend on the results of a meta-analysis that will be conducted on the results of 
our previous experiments but they may be used to drive the calibration game. 
Greater details regarding the calibration game are provided in the following 
section. 


3 The Calibration Game: Calibration of Visual Fidelity 


Although customizing the audio-visual interface using the results of a question- 
naire presented to each user that may include visuals and audio clips, here, 
customization of the audio-visual interface is accomplished using a simple game- 
based approach, making the process interactive and far more engaging. Our ap- 
proach is inspired by standard testing methodologies employed by optometrists 
to determine the optimum properties of corrective lenses in order to overcome a 
variety of visual deficiencies [20]. 

The calibration game presents the user with a split screen with the same game 
running in each window but under different fidelity/realism settings (see Fig. B] 
for an example), with a single background sound. The player chooses the screen 
they prefer by clicking a button just above the corresponding window. Their 
choice will be registered and the audio-visual fidelity of the game running in the 
other window will change (increase or decrease). This process will be repeated 
over a number of cycles (the total number of cycles can be easily modified), until 
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Window selection buttons 


Status 


Window1 Window 2 


Fig. 3. Visual calibration game sample. Two versions of a game running in each win- 
dow, each differing with respect to visual fidelity. The user then chooses which they 
prefer using one of the two selection buttons. 


the optimal fidelity level is reached. Currently, the game used is a Bomberman 
strategic, maze-based video game where the player completes levels of the game 
by strategically placing bombs in order to kill enemies and destroy obstacles. 
The game is controlled using the ‘W’, ‘A’, ‘S’, ‘D’ keys to move the character 
(bomber), and bombs are placed by pressing the space bar. Both windows repre- 
sent the same game-play (i.e., any actions to move the character or place a bomb 
will happen simultaneously in both windows). The calibration game was imple- 
mented using the Unity Game Engine and currently fidelity is defined by levels 
of cel-shading performed dynamically using a Unity shader. Although formal 
testing will follow, an informal test conducted with three participants revealed 
that the calibration game is easy to use and fun/enjoyable. 


4 Discussion and Concluding Remarks 


Prior work has demonstrated that the influence of sound on the perception of 
visual fidelity, and task performance within a virtual environment is complex 
and subjective, depending on a user’s prior experience, and musical preference. 
However, this is rarely exploited as the vast majority of serious games take a 
“one-size-fits-all” approach with respect to audio-visual fidelity and the choice 
of background sounds and sound effects. Here, preliminary details of a novel 
“calibration game” being developed to custom-tailor the fidelity of the visuals 
within a serious game were provided. The game itself was inspired by standard 
optometrist testing and prior work that used a similar approach to determine the 
optimal interaxial distance of a stereoscopic S3D game and found the method 
to be effective [21]. Currently, fidelity was defined with respect to cel-shading 
implemented using the Unity Game Engine; this was done as a proof-of-concept 
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to demonstrate the feasibility of such an approach and future work will examine 
other definitions of visual fidelity. 

The work presented here is part of a larger initiative whose goal is to develop 
a greater understanding of visual fidelity, multi-modal interactions, perceptual- 
based rendering, user-specific factors, and their effect on learning. Although 
greater work remains, providing users the opportunity to customize the virtual 
environment of their serious game prior to using it will ultimately help us develop 
more effective serious games. Future work will see continued development and 
refinement of the calibration game. This will include experimenting with various 
other games aside from what was included here and conducting further exper- 
iments that examine audio-visual interactions and perceptual-based rendering. 
Furthermore, a meta analysis on the results of our previous experiments that 
examined audio-visual interactions (in addition to any subsequent experiments 
that will follow), will be conducted to identify any patterns or relationships 
among the study results and determine the most favorable fidelity settings that 
can be included in the calibration game. Future work will also further develop the 
calibration game to allow for additional definitions of visual fidelity, including 
polygon count and texture resolution, followed by a usability study to examine 
the effectiveness of the calibration game. 
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Abstract. We propose real-time interactive game that is based on Augmented 
Reality (AR). It is composed of AR marker, Head Mounted Display and depth 
camera. By using marker, the proposed system augments game space, fishing 
place. And player can interact virtual game object such as bait or fish with bare 
hands based on computer vision. The rapid development of AR technologies 
has raised profound interests in the design of AR games, but the existing games 
have not provided realistically felt game environments because the way to play 
games remains the same when the platform is changed. In addition, studies in 
this field did not fully utilize AR technologies, so that inherent characteristics of 
AR game do not impact user experience and draw attention explicitly on design 
concepts. Our system gives the experience that is grasping the virtual objects. 
Also, it can be applied to various game contents that are actually felt as real. 


Keywords: entertainment, augmented reality, 3D interaction, HMD, hand- 
tracking. 


1 Introduction 


1.1. Background 


User interface design environment has been known to be a part of significant elements 
in game system.[1] Nowadays, traditional game interfaces such as joysticks, mouse 
and keyboards are shrinking every year[2] and game system starts to build AR based- 
interface.[3] These new streams of game interface design have brought some changes 
which are spatial transformation. For example, people can play the game while they 
are walking. And game environments escape from 2D into 3D. It means that game 
space expands out of monitors. Also, players expect to get a new experience they 
cannot get from other existing or similar form of games.[4] 


1.2  Problem-Posing 


With respect to players’ demand, game console makers have made effort to mix 
the reality and virtual reality. So Nintendo ‘Wii’, Sony ‘Move’ and Microsoft 
‘Kinect’ are developed.[5] Developed AR games were based on motion recognition 
technologies.[6] 
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However, there are some problems. First, these devices can recognize big motions 
like dancing or boxing. Thus, in small residential space or public place such as apart- 
ment and café, people have trouble with using it. Also, people who are wearing heavy 
clothes or skirts that hide body parts cannot use it because the devices cannot recog- 
nize gamers.[7] Therefore, small motion or elaborate control has difficulty handling 
the system embedded in these devices. Third problem is game controllers. Existing 
system needs to have traditional game controllers, which means that if people sit far 
from game devices, they cannot play with it. In addition, if users want to play tennis 
or guitar game, they should buy tennis racket controller or guitar controller that is 
suitable for specific device.[8] Finally, gamers cannot interact with game objects in 
3D space. The game objects remain to be still locked in the monitor. 


1.3 System Overview 


Our system provides players with distinguished experience that users can get into 
game space. We made fishing game and developed all the components needed to play 
the game in AR. A player can fish by virtual baits and virtual hook that are con- 
trolled by player’s hands and motion.[9] It means that they can touch and interact with 
augmented objects using bare hands. Fig.1 shows the game space which is built by 
our system. 


Fig. 1. Scene of our system, AR fishing game scene that user saw through a HMD. We devel- 
oped the game environment and all objects by Unity. 


2 System Design 


2.1 Game Scenario 


Fishing in real world is an outdoor activity. For giving users realistic-looking fishing 
place, we made game space in 3D not 2D which is shown as Fig.1. If users have 
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HMD and the marker, they can face the virtual fishing place anywhere. Fig.2 shows 
summary of the system scenario. 

We replicated similar interactions, which existed in real fishing techniques. These 
interactions are controlling the system. For example, the motion of harpooning is used 
to catch fish in our system. There are some techniques for catching fish; hand gather- 
ing, spearing, netting, angling and trapping. Among these techniques, we focused on 
implementing spear fishing, which is an ancient method of fishing that has been used 
throughout the world. So, it is intuitive for playing and user can catch how to play 
more quickly.[10] 


Select a bait Move one hand Pick the caught fish Put the fish in 


to catch a fish with other hand the basket 


Fig. 2. The game scenario. These steps progress with a user wearing a HMD. 


2.2 Game Flow 


As we designed the following interactions, our system aims to provide a user with 
immersion. (a) When user sees the marker, the game space that is virtual fishing hole 
is augmented on real space. And then (b) user opens his/her palm in front of HMD, 
system is ready to start the AR fishing game. After that, fishing tool like rod or spear 
is also augmented at user’s fingertip. According to [11], it is possible to recognize 
hand tracking in real time. So, (c) user can catch fish directly with bare hands where 
the virtual fishing tool hung, and put it in the basket. 

We put some entertaining and sport fishing elements. For pleasure and competi- 
tion, we add recreational element, which is a time limit. Player should catch fish as 
many as possible within limited time. 


Fig. 3(a). See the marker through HMD. Left is the scene of seeing the marker. Right is the 
scene of game space. All objects were augmented. 
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Fig. 3(c). Play the game. User can play the system anywhere. 


3 Implementation 


3.1. Hardware 


Our augmented reality system is based on a window7. We required RGB-D camera 
and HMD. HMD is Accupix Mybud with 852 x 480 pixel high resolution in each 
eye and a horizontal field of view of approximately 35° and Intel creative gesture 
camera as RGB-D camera which is 1280 x 720 pixel high resolution and depth resolu- 
tion is QVGA(320 x 240). The software was written in a C# using Vuforia SDK in 
Unity3D. 


3.2 Flow of System 


The proposed system is divided into four procedures, capturing RGB-D image, setting 
the environment, hand gesture recognition, and rendering as seen from Fig.4. 
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Environment Setting Rendering 
i RGBDimage  [RGBlmage | nfttite Object Pose(RT) of object 
[o a Capturing Tracking AR Basket 
HMD+Denth -_ 
cas & 
i _ Hand Tracking Coordinate Calibration }———»] AR Sea, Fish, Bait | _, 
Position of HMD 
fingertip Speaker 
AR content 
Sound effect 
|__| Gesture Recognition >| AR Fishing Rod 
Gesture of hand 


Fig. 4. System Flow chart 


3.3. Interaction Techniques 


Targeting. Vuforia supports sophisticated algorithms to detect and track the features. 
Vuforia SDK recognizes the image target by comparing these natural features against 
a known target resource database.[12] When the image is detected, the environment 
of the simulation is appeared onto the image. 

The overall pattern of data flow within the software is shown in Fig.4. Recall that 
Intel creative gesture camera as RGB-D camera. From the data from RGB-D camera, 
we can exploit various kinds of interaction.[13] First, we can use the image targeting 
to create a coordinate frame for the image, and then we make the fish environment for 
the user. Second, we can use the hand tracking data to create a coordinate frame for 
the hand, and track the hand point direction. We can orient both of these coordinate 
frame in to the same sense(Y is up, X to the right, -Z in to the screen or away from the 
user and roll is rotation around Z, pitch around X, yaw around Y). 


Tracking. The hand is recognized through camera. The user can move freely inside 
the interaction area in front of the Creative Interactive Gesture Camera. The tracking 
itself is accomplished by one camera system attached on the HMD. The visualizations 
is based on the camera system and rendered in HD quality on a HMD. The following 
paragraphs explain the components: 

Tracking of the hand was only needed for viewpoint. We selected a pragmatic and 
inexpensive solution with a Creative Interactive Gesture Camera that detects the near- 
est hand in front of camera. The viewpoint (the virtual camera in the 3D world) is 
moved accordingly to provide an immersive depth cue. Additionally, for selected 
levels the user is moved to another position in the 3D world. [14] 

For more natural motion, we removed controller which is disturbing immersion for 
the sense of realism. Without additional cumbersome supplementary devices, users 
can gain the use of naturalness while they enjoyed the game. 
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4 Evaluation 


4.1 Objective 


The objective of this study was to build system that users would interact with virtual 
objects in 3D space using bare hands. For testing effectiveness and usefulness of 
our system, which estimates how comfortable users use and how easy users play, we 
should measure what users feel by qualitative and quantitative methods. Then, 
we analyze the result and adjust the system for maximize user satisfaction and 
immersion. 


4.2 Procedure 


24 colleagues (12f + 12m, aged 22 — 28, students of graduate school of cultural and 
technology) evaluated our system. No one had previously experience with fish game 
with HMD. Thus, we gave a short introduction (3min) for all participants. This in- 
cludes watching the video (1min) and trying one practice game for each participant. 


Qualitative Measure. We collected participant comments in a post-experiment inter- 
view to gain further information. Also, making a questionnaire, we hand out a ques- 
tionnaire to 24 subjects. The questionnaire is in Fig.5. The odd number of question is 
about ease of playing and even is about difficulty of playing. If subjects agree strong- 
ly, scoring 5 points, or else scoring 0 point. 


Quantitative Measure. The quantitative measures were experiments consisted of 
estimating the number of fish users caught during the tests in limited time, which is in 
2 minutes. This experiment was used to adjust system composition. For example, it 
would help us determine the most suitable size of augmented objects. 


1) I think that 1 would like to use this system 
frequently 


Ease of use PSI Sound the system unnecessarily complex 


3 I thought the system was easy to 


Difficulty of use 4 


{ thought there 
this system 


7 I would imagine that most people would learn 
to use this system very quickly 


B | found the system very cumbersome to use =] 
9 | felt very confident using the system 
10 I needed to learn a lot of things before | could 


get going with this system 


Fig. 5. The questionnaire and result 
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4.3 Result and Analysis 


The results of the questionnaire are summarized in Fig.5. As result shows, most 
people strongly agreed that people understand how to use it easily (95.8%). The ques- 
tion that gains the lowest point is about people can acquire something before using 
this system. It means that people think almost nothing to be learned such as tutorial or 
mechanisms of some controllers when playing our system. However, 22 subjects re- 
sponded to the question ‘is it easy?’ with strongly agreement. 

Results of experiments show in Fig.6. Subjects fished 4.8 fish averagely within 2 
minutes. All subjects caught at least 3 fish. 25% of subjects clear the game. So the 
degree of difficulty in system needed to be modified. 


= 
oO 


x: the number of fish 
| ®@ y: The number of subjects 


0 1 2 3 4 5 6 


Fig. 6. Result of experiment. Limit time is 2 minutes. 
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4.4 Refinements of System 


Color Notification. We collected qualitative data form questionnaire, comments and 
observations. During semi-constructed interviews, we could get some comments. 
Some subjects said visual superimposing of real and virtual content in 3D space 
creates confusion about distinction. They cannot recognize where to interact. So, we 
made the guidance that presented information about whether user interacts or not. For 
this purpose, we considered a possible solution by providing color change of aug- 
mented objects. As Fig.7 shows, if user catch augmented fish, fish color is changed. 


Fig. 7. Color is changed if a fish is caught 
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Reducing Size. For tuning the degree of difficulty, we shortened the length, height 
and width of augmented fish by 0.18 times. After lessening the size of fish, people 
tend to take longer time to catch a fish. The result is shown in Fig. 8. Subjects caught 
2.3 fish averagely. 


Table 1. Change the size of fish and result of experiment. We scaled down the game object, 
fish. Modified fish size is one fifth than original one. Subjects try to catch the fish averagely 3 
times. It is longer than before by 3times. 


pi efore after 


Fish scale x=0.2 y=0.4 z=1 x=0.05 y=0.05 z=0.2 

the number of times trying all 3 
10 7 
8 + 
6 + 
4 } 

x: the number of fish 
2 | Wl y: The number of subjects 
0 1 
0 1 2 3 4 5 6 


Fig. 8. After scale down, result of experiment 


5 Conclusion and Discussion 


In this paper, we introduce AR fishing game for enhancing user immersion and over- 
coming space limit. Interacting augmented objects with human hands in our system 
will strength the basis of how to control games. Our method is more powerful than the 
conventional immovable-video game using cumbersome equipment and mobile game. 
Our AR fishing game method can be applicable to many AR applications (i.e., future 
experimental education, urban planning, military simulation, collaborative surgery, 
etc.). 

The techniques we described could be applied to other sports simulation. In order 
to apply the techniques to other sports simulation, different and sophisticated interac- 
tion techniques should be explored and studied in more detail. The project also 
shows how interaction techniques in 3D environment can be used to create an enter- 
tainment. 


6 Future Work 


For refining our design environment, different interaction techniques would be ex- 
plored and studied in more detail. For example, if user takes a natural posture of 
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throwing a fishing rod to catch fish, this gesture will be implemented as direct input 
on the system. Also, waving hand controls how far the fishing rod casts. 

For future work, we plan to use RGB-D Image capturing including color and depth 
information together for modeling and tracking. It makes players more focus on game 
with their bodies than playing traditional game. Moreover, we make augmented fish 
movement more realistic in specific situation and develop tracking to enable users to 
have a continuous experience whether or not the target remains visible in the HMD 
field of view. We investigate the applicability of the developmental concepts to other 
sports simulation or game. 
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Abstract. THE GROWTH is an environmental game aiming to tackle growing 
population issues and its impact on natural environment. The game also extends 
to cover social issues and unsustainable resources consumption caused by rapid 
population growth. Unlike many environmental games, THE GROWTH de- 
monstrates that financial, social and health factors can be improved simply by 
committing to sustainable consumption patterns. The game aims to investigate 
the possibility of using serious games to promote players’ environmental 
awareness and ultimately, the possibility of using serious games to modify 
players’ consumption patterns. This game is designed for a specific target group 
of male population between 20-30 years of age in Bangkok (Thailand) and is 
focused on environmental issues raising the residential accommodation. Early 
experimental sessions were conducted with 17 participants and this paper 
presents the preliminary results of the study. 


Keywords: Applications: Education, Applications: Entertainment, Applica- 
tions: Virtual worlds and social computing, Interaction and navigation in VR 
and MR: Immersion, serious games. 


1 Introduction 


Global human population exceeded 7.1 billion by the end of 2013 [1]. The trend of 
rapid population growth is projected to continue at a rapid pace, with one study esti- 
mates global population to exceed 8 billion by 2030 [2]. From an environmental point 
of view, the growing human population imposes pressures on the natural environment 
through increased resources extraction and consumption rates [2]. For example, the 
expansion of the agriculture sector is one of the main drivers for deforestation [3]. At 
the same time, man-made hazardous wastes and pollutants are spreading into the envi- 
ronment, impacting natural habitats and wildlife [4]. The effects of pollution also 
reflect back on human habitats and contribute to health problems [5]. 

Apart from this environmental perspective, unsustainable population growth also 
poses social and well-being risks as well. Globally, about 1.2 billion live in extreme 
poverty [6]. Also, a number of populations worldwide are living in vulnerable and 
diminished conditions [7]. 

Seeking to address these issues, THE GROWTH is an environmental game 
which aims to highlight the interrelationship between natural environment, economic, 
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population, and human community. Unlike many environmental games which use an 
"environmental-only" model, THE GROWTH uses financial incentives, health im- 
provements and social well-being as key messages to reach out to non- 
environmentally conscious players. 


2 Serious and Commercial Games with Environmental 
Characteristics 


Researchers have highlighted the possibilities of using games and simulations to mo- 
tivate the public (especially children and teenagers) in order to help them understand 
the importance of environmental conservation. Games and simulations can be used as 
supplemental material to allow learners to gain knowledge in “a more interactive 
way” [8]. Players can also find games to be motivating [9]. Researchers have sug- 
gested that computer games can be used to promote awareness [8] and even alter 
behaviors [10]. 

Currently, many computer games are designed to address and tackle environmental 
issues. For example, The CUSTOMER project (Coventry University Students’ Opti- 
mization and Management of Energy Resources) is an energy awareness game devel- 
oped at Coventry University [11]. The focus group of this game is students living in 
university accommodation or apartment complexes. The game is set in a standard 
student room, wherein players assume the role of a student in university accommoda- 
tion. Players have to take key actions in order to minimize energy and water con- 
sumption without compromising their own health and safety issues. For example, 
players can put a laptop computer in stand-by mode or turn it off in order to save elec- 
tricity. Players can turn-off the lights in bathroom and living room or close the 
window in order to prevent heat loss. Another serious game seeking to tackle envi- 
ronmental issues is the BBC CLIMATE CHALLENGE game [12]. The game aims to 
tackle global warming issue among other environmental and societal issues. The goal 
of this game is to set a target for CO2 reduction, and devise policies in order to 
achieve the target. Some policies can benefit the environment but also cause a nega- 
tive impact on economy. Thus, players have to balance multiple game factors. Anoth- 
er serious game in this genre is ENERGYVILLE [13]. In this turn-based game, 
players are given a pre-defined budget and must initiate the construction of power 
stations in order to satisfy energy demands of the city. There are several types of 
power stations available to players (e.g. nuclear, coal, natural gas or wind power). 
Alternatively, players can invest in energy conservation policy which helps reduce 
some energy demands. The game also includes random events. For example, random 
events such as oil conflict and coal shortage can have a consequence on the energy 
future of the city. 

Several studies have obtained interesting information and positive outcomes from 
serious games [14], [15]. While many games (both commercial and serious games) 
are now based in ICT, a study has shown that a tradition paper-based game can be 
used as an effective tool to address environmental issues as well [14]. Commercial 
games have also been designed to target environmental issues, for example 
ANNO2070, a game published by Ubisoft in 2011 [16] envisages a future scenario 
where the world has been largely affected by global warming, causing the rise of sea 
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level and separated landscapes into chains of small islands. Environmental factor 
plays a prominent role in this game. Positive environmental condition improves agri- 
cultural production, and reduces health problems and disasters. Concordantly, critical 
environmental condition in this game inflicts negative impacts on agricultural produc- 
tion, magnifying health problems and disasters. 

Some commercial games also highlight constraints between population size, natu- 
ral resources and the environment. Examples of these games include FATE OF THE 
DRAGON [17] and STRONGHOLD: CRUSADER [18]. In FATE OF THE 
DRAGON, food supplies play a prominent role where players must maintain food 
stock for the army. Large armies will deplete food stock rapidly and food shortage 
leads to severe reduction in combat performances. The game also includes unpredict- 
able natural disasters such as a locust swarm (affecting agricultural output severely), 
fires, earthquakes, and disease outbreaks, which can affect an unprepared army. 

In STRONGHOLD: CRUSADER, players must maintain food supplies similar to 
FATE OF THE DRAGON. Population morale in STRONGHOLD: CRUSADER can 
be increased by providing multiple food types and issuing large food rations. The 
game also demonstrates the effects of disease, with a densely populated town is at 
higher risk from disease outbreak. 


3 CASE STUDY: THE GROWTH Serious Game: A SG 
Aiming to Increase the Environmental Awareness of the 
Players 


THE GROWTH is a single-player, role-playing serious game seeking to promote 
environmental awareness amongst players. The game is currently being developed at 
Coventry University, United Kingdom. This game is based on the current global trend 
of rapid human expansion, ecological degradation and exhaustive resources consump- 
tion, as the key issue for environmental degradation. Apart from environmental as- 
pect, this game also attempts to demonstrate to players that unsustainable population 
growth causes a negative impact on society and economy as well. 

This game is designed to target the Thai population specifically. The final game 
version will be delivered in Thai language, with an English version also available. 
About 50% of content in this game is based upon local issues in Thailand. The deci- 
sion of creating a ‘game designed for a specific group’ is borrowed from Coyle 
(2005), who notes that the public are generally more attracted and motivated to issues 
in their immediate surroundings [19]. However, because environmental problems can 
spread from one region to another, it is necessary for the public to recognize and be- 
come aware of global environmental problems as well. Hence, around 50% of issues 
in this game are based on global themes. 

THE GROWTH suggests that other strategies should be used in order to convey 
messages to players. For example, Rose (2009) suggests that environmental messages 
should inform the public of rewards that can be gained from environmental conserva- 
tion [20]. To this end, a range of content in the game seeks to demonstrate to players 
that sustainable consumption can help save household expenditure as well (e.g. invest 
in water efficient products in order to save the environment and water bills at the 
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same time and unplug electronic appliances after use to prevent cost associated with 
phantom load). 

The game design approach was based on the Four-Dimensional-Framework [21]. 
The four dimensional framework (4DF) has been developed to aid evaluation, valida- 
tion and development of serious games. The framework outlines four dimensions: the 
learner dimension (user modeling and profiling, specified needs and requirements of 
learners), the pedagogic dimension (using associative, cognitive and situative mod- 
els), the representation of the game (differing levels of interactivity, fidelity and im- 
mersion required to support the learning objectives) and the context within which 
learning takes place (including disciplinary context, place of learning and resources 
available). The author decided to use this framework because the 4DF mapping al- 
lows for end-to-end development, validation and evaluation of the game, using a par- 
ticipatory design approach model. Furthermore, the ARCS Model of Motivational 
Design [22] was used as a guideline during game concept development. The setting of 
THE GROWTH provides a background story that shares many similarities to Earth; 
the world is experiencing a rapid ecological degradation situation caused by rapid 
human expansion, deforestation, land transformation, resources extraction and pollu- 
tion. In THE GROWTH, humans have already appropriated about 78% of planetary 
resources. The remaining plants and animal species are being challenged by man- 
made pollution as well as global warming and climate change. Five years ago, a major 
industrial explosion (known as “The Event’) occurred which collapsed an entire dis- 
trict and released a massive amount of pollution into the air. This causes major im- 
pacts on remaining natural environment as well as coating most buildings and land- 
scapes with orangish polluted particles. The background story is presented to players 
by slides of static images and accessible to players at the main menu. 


Fig. 1. Game environment 


3.1. Role-Playing 


The game puts players in the role of a newly elected president of a giant environmen- 
tal group called “The Environmental Consortium’ or TEC. TEC is in fact, a front (co- 
vert) organization of the government, that exists to combat environmental problems 
and conserve the region’s remaining natural resources. Apart from affiliation with the 
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government, TEC is working closely with other major environmental organizations in 
the region. 


3.2 Key Factors (resources) 


There are four key factors in this game. These being: Population, Environment, 
Emergency Supply, and Wealth. Players lose the game if environmental factor is dep- 
leted. Key characteristics for each factor are the following: 


e Population factor: represents total population in this region. Large population size 
causes environmental factor to deteriorate at faster rate. However, the region will 
experience economic collapse if the population size is lower than 30% of total ca- 
pacity. Economic collapse will reduce players’ income greatly. This means players 
will have to balance population size in order to maintain healthy natural environ- 
ment and economy at the same time. 

e Environmental factor: represents environmental situation in the city. By default, 
the environmental factor slowly decreases to reflect degradation caused by human 
activities. However, environmental factor can decrease at various speeds. For ex- 
ample, at 35% of total capacity, environmental factor will decrease faster by 15% 
(this represents the scenario where environmental condition starts to deteriorate 
beyond recovery). Also, natural disasters are likely to occur much more frequently 
when environmental factor is running low. 

e Emergency Supply factor: represents supplies that players will need to spend on 
special missions (e.g. in the events of famine or armed conflicts). 

e Wealth factor: represents accumulated wealth that players can spend on certain 
improvements and investments (discuss below). Also, certain amount of wealth 
will be automatically withdrawn from players’ treasury to be spent as humanitarian 
aids in case of disasters. 


3.3. Player Actions: Overview 


There are several actions that players can perform in this game. For example, players 
can setup a campaign to promote sustainability which improves environmental condi- 
tion and decrease population growth rate. Players can order a large amount of emer- 
gency supplies to be produced (but this also comes at the cost of the environment). 
Also, players can accumulate wealth which can be used to invest in certain improve- 
ments. 


3.4 Player Actions: Upgrade and Improvements 


Upgrades represent technologies and policies that players can research and implement 
in order to provide certain benefits for the region. For example, an ‘advanced public 
transportation network’ encourages the population to use the public transport which 
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improves environmental condition as well as generate small amount of wealth (from 
reduced energy imports). Another example is ‘family planning upgrade’ which encou- 
rages parents to plan for optimum family size (slowing population growth). Each up- 
grade requires investment of resources from players (chiefly wealth among others). 
Upgrade and improvements can fail to be implemented. In this case, players will lose 
their investment in the process. Also, all upgrades provide temporary benefits for the 
region which means they need to be re-invested after certain period of time. The major- 
ity of upgrades in this game are based on existing and emerging technologies and poli- 
cies. Upgrades can be further categorized into three types based on their characteristics. 
These being: technology, policy and propaganda. Technology represents highly ad- 
vanced devices that can be used to help mitigate environmental and social problems 
(e.g. automation). Upgrades that fall under this category are generally very expensive 
to develop. They also have a low to moderate chance of successful implementation. 
However, successful implementation of technology can provide significant beneficial 
effects to the region for a very long period of time. Policies, on the other hand, 
represent the law imposed in the region. Upgrades that fall under this category are 
generally moderately expensive to initiate. A number of them have a relatively low 
chance of successful implementation, but can provide tremendous amount of beneficial 
effects to the region for a long period of time. For example, “Carbon tax policy’ can 
greatly improve environmental condition since it influences public’s consumption 
patterns in many ways. However, this has a relatively low chance of successful imple- 
mentation, owing to a perceived public rejection. Lastly, propaganda represents play- 
ers’ attempts to communicate with the public and request for their cooperation. 
Upgrades that fall under this category are generally inexpensive to initiate. A number 
of them have a relatively low to moderate chance of implementation, but can provide 
significant amount of beneficial effects to the region for a moderate duration. Some 
examples of upgrades from this category include: recycle programs, car pooling, and 
energy conservation program. Apart from beneficial effects, upgrades and improve- 
ments can trigger or prevent the occurrence of certain game events. For example, in- 
vesting in ‘Marine Conservation Program’ can trigger the public to donate certain 
amount of wealth to support players’ environmental causes. Another example can be 
seen in ‘Disease Control Program’ which reduces occurrence of disease outbreak in the 
region by 50%. 


3.5 Real Property 


Players can purchase real properties in order to gain long-term benefits. Similar to 
‘Upgrade & Improvements’ function, players must first pay capital investment (i.e. 
wealth). However, unlike “Upgrade and Improvements’, real properties provide con- 
stant investment return for player permanently. Players can also further efficiencies of 
buildings by ‘equip’ them with certain improvements. 

For example, players have recently purchased a high-rise apartment. This apart- 
ment now generates 300 points of wealth per second for players. Player can impose 
restriction on elevator usage for this building which results in players gaining 1 envi- 
ronmental point per second in return (for energy conservation). 
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3.6 Player Actions: Campaigning (quiz) 


Campaigning represents players’ attempts to promote environmental and social 
awareness to the public. This is represented by series of quiz where players have to 
read a random article and match them with a corresponding category. Successful 
matching reflects the fact that players have successfully implant awareness into the 
public. This provides players either population reduction, environmental or wealth 
bonus. For example, a random article describes recent issues on high rate of un- 
planned pregnancies and venereal diseases infection in Thai teenagers. A player now 
marks this article as ‘social issue’. By solving this quiz, population growth in the city 
will be decreased for a short period. All articles in this game are collections of real- 
world events (from both local and global perspective). 


3.7. Player Actions: Special Actions 


Special actions represent emergency edicts that can be issues by players. There are 
three types of special actions in this game. These are: Emergency Supplies, Environ- 
mental and Population Special Actions. All special actions need to be ‘recharged’ 
once used. 

Emergency Supplies Special Action is characterized as mass and rapid production 
of supplies. Once selected, it adds certain amount of Emergency Supplies to players. 
However, this special action costs players in wealth and also contributes some dam- 
ages to natural environment. Environmental Special Action is characterized as an 
emergency land reclamation project. This special action helps improve overall envi- 
ronmental condition in the region, but costs players in wealth. Lastly, Population Spe- 
cial Action is characterized as the government’ effort to decrease population growth 
rate. This function also costs players in wealth. 


3.8 Random Events 


The aim of random event system is to promote players’ awareness on unpredictable 
nature of environmental and social situations. In THE GROWTH, events such as 
contamination, disease outbreaks and natural disaster may occur as the game 
progresses. For example, the game informs players of toxic waste being leaked from a 
landfill as a result of landslide. The resulting event causes 1,000 points of environ- 
mental damage. Also, 3,000 wealth and 100 supplies are withdrawn from player’s 
treasury to fund a recovery project. As mentioned above, occurrences of some events 
are depended on players’ performances. For example, a random event informs players 
that the public has donated 80,000 wealth for players as a reward for safeguarding 
their natural environment (this event has a higher probability to make appearance if 
environmental condition in the region is high). 


4 Preliminary Sessions 


Preliminary sessions were conducted between early 2012 to August 2013. The objec- 
tive of preliminary sessions was to investigate key areas such as participants’ overall 
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satisfaction with the game, motivation and basic learning outcomes. Purposive sam- 
pling method was used to recruit participants. This method was selected because the 
game will be focus on a specific population group (Thai). Also, the method was con- 
venient for recruiting a limited number of participants in a limited timeframe. The 
total of seventeen (17) male participants of Thai nationality between 22-29 years old 
voluntarily participated in preliminary sessions. These were: 11 post-graduate stu- 
dents based at Coventry University, 3 undergraduate students (based in Thailand) and 
3 office employees with post-graduate degree (also based in Thailand). All 11 post- 
graduate students from Coventry University were contacted directly (face-to-face) by 
a researcher in the UK. This is the first group to participate in preliminary test ses- 
sions. The other six participants were first contacted by researcher’s agent in Thailand 
and, after receiving participants’ confirmation, the researcher contacted them again 
via both telephone and e-mails in order to arrange for date and time. All six partici- 
pants in this group did not have any contact with the researcher prior to their recruit- 
ment. Preliminary sessions with this second group were conducted in Bangkok 
(Thailand). 

All participants reported they had used computer games and paperboard games be- 
forehand. In regards to computer games, five participants have identified themselves 
as avid gamers, while another 12 identified themselves as casual gamers. As this 
project aims to deliver a game from single-player perspective, all sessions were con- 
ducted with one participant at the time. All sessions have been conducted at partici- 
pants’ private space. 

Preliminary sessions were heavily relied on physical equipments such as paper 
cards which were used as a ‘mock-up’ to represent concepts of the digital version. A 
laptop computer was used to assist the gameplay as well as record players’ progress. 
A dice was used to produce randomness and uncertainty during the session. Each 
session was completed within approximately 90 minutes. 

A Likert-style rating scale was employed to obtain players’ level of satisfaction 
with the game (e.g. learning curve, theme, graphical representation and game mechan- 
ism). Semi-structured interview was employed to establish participants demographic, 
investigate players’ game experiences and knowledge gains. Follow-up questions 
were used where necessary in order to gain additional information from players. Tran- 
scripts were analyzed using thematic analysis. Thematic analysis was selected for its 
flexibility, opportunity to gain insights from the information, and relatively quick in 
term of execution [23]. All preliminary sessions were conducted in the Thai language 
(participants’ native language). 

Each session started with the researcher welcoming a participant to the study, fol- 
lowed by approximately five minutes of informal discussion in order to establish an 
acquaintance with the participant. The researcher then opened with a formal introduc- 
tion of the project. Participants were told that they were recruited to help evaluate a 
game project and that their opinions would contribute to future development. In all 
sessions, participants were encouraged by the researcher to produce comments and 
criticize the project freely. 
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4.1. —_ Results and Discussion 


The majority of participants did not experience difficulties with the learning game 
mechanism. However, several participants reported a degree of distraction because 
multiple pieces of physical equipment were utilized during preliminary sessions, due 
to the mock-up stage of development. A majority of participants also commented on 
the graphics and visual aesthetics of the game positively. Also, degree of freedom and 
neutral representation in THE GROWTH were appreciated by participants. One par- 
ticipant noted that: “I like the way there are many things you can do to help the pla- 
net’. In another account, a participant noted that “The content is rich .. I also like the 
way [the game is] more about science, saving money and quality of life .. less about 
just save cute animals”. 

In term of motivation, 16/17 participants have reported that their main motivation 
was to learn about new technologies and actions that can be taken to reduce personal 
energy consumption. 10/17 participants have shown their strong interests in social 
issues highlighted in THE GROWTH. Many participants have expressed their inter- 
ests to review all the cards even after game sessions in order to learn more about con- 
tents that they might have missed during the sessions. In one account, a participant 
noted that “I know about solar panels, but I’ve never imagined using the sun [solar 
energy] for cooking! In another account, a participant noted about solar tower tech- 
nology as followings: “That’s nice, so even the desert can be used to produce energy” 

Cutting personal spending seems to be the top priority for almost all participants 
while co-benefits on the environment seem to be acknowledged to a lesser degree. 
This holds true for both UK and Thailand-based participants. Energy (gas and elec- 
tricity) seems to be a popular topic amongst participants, possibly due to much higher 
electrical consumption (and cost) when compared to water. The transportation topic 
received good attention from participants as well. Already, public transportation is the 
primary choice of travel for 12 participants. 15/17 participants admitted that they are 
unlikely to pay premium price for environmental-friendly products such as food, but 
are more likely to invest on energy and water efficient products in order to save ener- 
gy bills in the long run. Interestingly, 8 participants reported that they would welcome 
tax imposed on unsustainable products (e.g. carbon tax), but only in the form of poli- 
cy (i.e. applied to all consumers). According to several participants, real-world factors 
also play an important role in participants’ commitments to environmental causes. 
One participant noted that “I’ve been separating recyclable from [municipal] waste 
for many years, but garbage collectors seem to mix and crush everything altogether in 
the truck so I’m not sure if the government is still working on this [recycling] or not”. 
Another participant cited the lack of recycle bins in his area as the reason to stop re- 
cycling. On another account, a participant noted that “I separate my wastes so I can 
give them [recyclable waste] to [a] waste buyer’ for free as a good gesture”. 

In term of learning outcomes, Knowledge gains were measured by participants’ abili- 
ty to recall and describe articles from the game. All participants were able to recall and 
discuss articles from the cards (highest = 8/10, mean = 5.7/10, lowest =2/10). Energy 
saving and emerging technologies were most recalled topics. Deforestation and illegal 


Waste buyers are common sight in many areas of Bangkok. These merchants visit houses in 
a hope to buy any recyclable wastes. 
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encroachment were also topics of interest to them (possibly due to recent reports of 
government’s mass prosecution of illegal loggers and encroachment on natural habitats 
earlier this year). Unplanned pregnancies and crimes were topics most recalled by par- 
ticipants in social category. 

There was no noticeable difference in knowledge gains between avid-gaming and 
casual-gaming participants. However, office workers and UK-based students have 
demonstrated greater ability to recall articles from the game. This is possibly due to 
an increased awareness through their responsibility over utility bills. 


4.2 Limitations and Conclusion 


THE GROWTH is an environmental game with a special focus on rapid population 
growth. Unlike many environmental games, THE GROWTH highlights interrelation- 
ship between natural environment, economy and society. THE GROWTH offers over 
60 methods that players can take in order to tackle environmental and social prob- 
lems. Similar to the real world, each solution has its own advantages and limitations. 
Some of these solutions are burrowed from experimental practices and can be consi- 
dered as ‘unconventional’ by the standard of many environmental games. This 
provides players with a degree of freedom (and sometime exposes them to negative 
consequences should players fail to consider them carefully). Unlike many environ- 
mental games, some factors in THE GROWTH can be adapted to players’ progresses 
(dynamic game world). The game’s theme emphasizes that environmental conserva- 
tion efforts are highly dependent on public support (both physical contribution and 
funding). 

This project is still in development stage and contains many limitations. Results 
from preliminary sessions were obtained by post-test only method. This means the 
researcher is unable to establish whether the positive learning outcomes were resulted 
from participants’ previous exposure to other environmental information or the game 
itself. According to participants, the game was described as simple to understand, 
engaging and educating. Future development aims to expand game contents based on 
players and experts’ comments, develop a systematic evaluation framework to explore 
users-game interactions such as usability, level of engagement, knowledge gain, 
retention, and knowledge transfer. 
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Abstract. Technology-assisted intervention has the potential to adaptively indi- 
vidualize and improve outcomes of traditional schizophrenia (SZ) intervention. 
Virtual reality (VR) technology, in particular, has the potential to simulate real 
world social and communication interactions and hence could be useful as a 
therapeutic platform for SZ. Emotional face recognition is considered among 
the core building blocks of social communication. Studies have shown that 
emotional face processing and understanding is impaired in patients with SZ. 
The current study develops a novel VR-based system that presents avatars that 
can change their facial emotion dynamically for emotion recognition tasks. Ad- 
ditionally, this system allows real-time measurement of physiological signals 
and eye gaze during the emotion recognition tasks, which can be used to gain 
insight about the emotion recognition process in SZ population. This study 
further compares VR-based facial emotion recognition with that of the more 
traditional emotion recognition from static faces using a small usability study. 
Results from the usability study suggest that VR could be a viable platform for 
SZ intervention and implicit signals such as physiological signals and eye gaze 
can be utilized to better understand the underlying pattern that is not available 
from user reports and performance alone. 


Keywords: facial expression, emotion recognition, virtual reality, IAPS, 


adaptive interaction, eye tracking, physiological processing, schizophrenia 
intervention. 


Introduction 


Schizophrenia (SZ) is a debilitating psychotic disorder that affects about 1% of the 
population, costing more than $100 billion annually in the USA. It causes emotional 
and cognitive impairments [1] and is defined as a splitting of thoughts from feelings 
[2]. Some of the psychotic symptoms such as hallucinations and delusions are partly 
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ameliorated by antipsychotic drugs, but the route to recovery is hampered by 
social impairments [3]. Currently available social interventions can be helpful but low 
compliance rates and lack of access to such programs for most patients can be 
problematic. 

Deficits in social cognition, including emotion processing, social cue perception, 
empathy, mental state attributions, and theory of mind lead to poor functional out- 
come in SZ even after improvement in psychotic symptoms [3,4]. Thus, there is a 
need for efficacious cost-effective, low-burden and high-compliance interventions for 
social deficits in SZ, which would likely increase positive outcome. Improvement in 
emotion processing, a core deficit, and social understanding would be crucial for im- 
proved social outcomes. SZ patients appear to have impairments in recognizing faces 
and emotional expressions and disturbances in emotional functioning are major disa- 
bility in SZ [5,6]. Traditional static emotional pictures, more specifically the Interna- 
tional Affective Picture System (IAPS), were used to elicit emotional experience in 
SZ [1]. The apparent disconnect between outward display of emotion by SZ patients 
and the actual internal feeling could be studied by understanding the involuntary peri- 
pheral physiological responses of the sympathetic central nervous system (CNS). 

In the context of technology-based SZ intervention, Virtual Reality (VR) systems 
have been investigated with SZ for symptom assessment [7], training of medication 
management skills [8], hallucinations training [9], social perception [10], role play 
[11], and improving the diagnosis of SZ [12]. However there are limited applications 
of VR in emotion processing and identification for SZ. Moreover, these VR systems 
solely rely on user reporting and outward measures of performance. To mitigate these 
limitations, one should combine dynamic presentation of emotional expressions to- 
gether with implicit physiological response and eye gaze processing. Implicit cues can 
be useful to understand the underlying psychological states that are not possible using 
performance-based systems or simple user reporting. 

In this work, we present a novel VR-based system that incorporates implicit cues 
from peripheral physiological signals [13,14] and eye tracking [15] for the under- 
standing of facial emotional expression. We compare how a SZ group and a matched 
group of healthy non-psychiatric adults performed emotion recognition tasks when 
presented in the form of static IAPS slides and when presented in a VR environment 
with the avatars expressing emotions dynamically. 

The remainder of the paper is organized as follows. Section 2 describes the details 
of the two systems (i.e., the static IAPS pictures presentation system and the VR sys- 
tem). Section 3 details the methods and procedures followed in the usability study. 
Section 4 presents the results and highlights their implications. Finally, Section 5 
discusses the conclusions and future extensions of this preliminary work. 


Z Systems Overview 


Both the IAPS presentation system and the VR system were composed of three 
major components: the presentation environment, the eye tracking component, and 
the peripheral physiology monitoring component. The presentation environments 
were based on Unity3D game engine by Unity Technologies (http://unity3d.com). 
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A remote desktop eye tracker by Tobii Technologies (www.tobii.com) called Tobii 
X120 was employed for gaze tracking. A wireless physiological signals acquisition 
device called BioNomadix by Biopac Inc. (www.biopac.com) with 8 channels was 
used to record the physiological signals. Each component ran separately while com- 
municating via a network interface. 


2.1 The Static IAPS Presentation 


We developed a picture presentation system using Unity3D that displayed the full 
screen images on a 24” flat screen monitor at 1024x768 resolution. The pictures were 
preselected from the pool of about 600 IAPS pictures [16]. They were categorized 
into 6 major groups, namely, social positive (pictures of erotica), social negative (vi- 
olence pictures), social neutral (people in normal scenery), non-social positive (pic- 
tures of food), non-social negative (pictures of dirty and unpleasant scenery), and 
non-social neutral (normal pictures of objects). The emotional pictures were broadly 
divided into social and non-social and within each broad category, they were further 
categorized into positive, negative and neutral. All the 6 categories consisted of 4 
pictures each. The erotica pictures were selected appropriately for men and women 
subjects. After a 10 second presentation of the picture, the subjects were presented 
with choices to rate their emotional experience on how aroused the pictures in the 
preceding category made them feel (in a pictorial scale of 1-9, see Fig. 1), the valence 
of the emotion they felt (in a pictorial scale of 1-9) and the actual emotion they felt 
(out of 5 emotions and neutral). The subjects were seated around 70-80 cm from the 
computer screen during the whole IAPS pictures presentation session. 


Please, choose how aroused (or excited) you are right now. 


fad Feed 


ine ie 
6 rs 


Fig. 1. IAPS pictures presentation system with the arousal rating 
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2.2 VR Emotion Presentation 


The VR environment was originally developed for emotion recognition for adoles- 
cents with autism spectrum disorders (ASD) [17]. Due to the similarity of emotion 
recognition impairment in ASD and SZ, we customized the system to suit the new 
target group with 5 emotions (joy, surprise, fear, anger, and sadness). The avatars 
were customized and rigged using an online animation service, mixamo 
(www.mixamo.com) together with Autodesk Maya. All the facial expressions and lip- 
syncing for contextual stories narrated by the avatars were animated in Maya. A total 
of seven avatars including 4 boys and 3 girls were selected. Close to 20 facial bone 
rigs were controlled by set driven key controllers for realistic facial expressions and 
phonetic visemes for lip-sync. Each facial expression had four arousal levels (i.e., 
low, medium, high, and extreme, see Fig. 2). A total of 315 (16 lip-synced stories + 
28 emotion expression plus neutral for each character) animations were developed 
and imported to Unity3D game engine for task presentation. 


Fig. 2. Example surprise emotion with its four degrees of arousal 


The logged data was analyzed offline to illustrate differences in physiological and 
gaze responses between the patient and the control groups. 


2.3. Eye Tracking and Physiological Monitoring Components 


The eye tracker recorded at 120 Hz frame rate allowing a free head movement of 30 x 
22 x 30 cm (width x height x depth) at an approximately 70 cm distance. We used 
two applications connected to the eye tracker: one for diagnostic visualization as the 
experiment progressed and another one to record, pre-process and log the eye tracking 
data. The main eye tracker application computed eye physiological indices (PI) such 
as pupil diameter (PD) and blink rate (BR) and behavioral indices (BI) [18] such as 
fixation duration (FD) from the raw gaze data. 

The wireless Bionomadix physiological monitoring system with a total of 8 chan- 
nels of physiological signals was running at 1000 Hz. The physiological signals moni- 
tored were: pulse plethesymogram (PPG), skin temperature (SKT), galvanic skin 
response (GSR), 3 electromyograms (EMG), and respiration (RSP). Due to the appar- 
ent disconnect between what patients with SZ feel and their outward expressions, they 
are not usually expressive of their internal affective states and these states often are 
not visible externally [2]. Physiological signals are, however, relatively less affected 
by these impairments and can be useful in understanding the internal psychological 
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states and pattern [1]. Among the signals we monitored, GSR and PPG are directly 
related to the sympathetic response of the autonomic nervous system (ANS) [19]. 


2.4 Physiological and Gaze Data Analysis 


The collected physiological data were processed to extract useful features and deci- 
pher any differences between two subject groups for conditions of selected emotional 
expressions presentation and neutral baseline condition. We specifically chose fea- 
tures from PPG, GSR, RSP and SKT for this analysis. These features were chosen 
because of their correlation with engagement and emotion recognition process as 
noted in psychophysiology literature [14,19,20,1]. The PPG were used to extract heart 
rate (HR), which is a cardiac index used to measure stress and certain emotions [21]. 
The GSR is decomposed into two major components, i.e., phasic and tonic compo- 
nents, and from them features such as skin conductance response rate (SCRrate) and 
mean skin conductance level (SCL) were extracted. The RSP signal was used to ex- 
tract the breathing rate (BR). The mean skin temperature (SKT) was obtained from 
the SKT signal. For the eye tracking data, we extracted the following features: pupil 
diameter (PD), fixation duration (FD), sum of fixation counts (SFC), saccade path 
length (SPL), and blink rate (BR). Statistical two sample unequal variance t-test was 
used to quantify the significance of the differences. 


3 Methods and Procedure 


3.1. Experimental Setup 


The presentation engine ran on Unity while eye tracking and peripheral physiological 
monitoring were performed in parallel using separate applications on separate ma- 
chines that communicated with the Unity-based presentation engine via a network 
interface. The VR task was presented using a 24’’ flat LCD panel monitor (at resolu- 
tion 1980 x 1080) while the IAPS picture was presented on the same monitor with a 
resolution of 1024 x 768 in order to preserve the original resolution of the images. 
The experiment was performed in a laboratory with two rooms separated by one-way 
glass windows for observation. The researchers sat in the outside room. In the inner 
room, the subject sat in front of the task computer. The task computer display was 
also routed to the outer room for observation by the researchers. The session was 
video recorded for the whole duration of the participation. The study was approved by 
the Institutional Review Board of Vanderbilt University. 


3.2 Subjects 


A total of 6 patients with SZ (Male: n=3, Female: n=3) of ages (M=45.67, SD=9.09) 
and an age and IQ matched 6 healthy non-psychiatric subjects (Male: n=5, Female: 
n=1) controls of ages (M=42.5, SD=8.21) were recruited and participated in the usa- 
bility study. All patient subjects were recruited through existing clinical research pro- 
grams and had established clinical diagnosis (Table 1). 
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Table 1. Profile of subjects in the patient group and the control group 


Groups 
Demographic Information Healthy Controls Schizophrenia 
(1F, 5M) (3F, 3M) 

Age 42.5 (8.21) 45.67 (9.09) 
IQ 109.5 (8.86) 106.5 (4.09) 
Years of Education 16.5 (2.34) 13.33 (1.97) 
Age of Illness Onset 20.83 (4.07) 
Medication Dose” 386.89 (194.65) 
Current Symptoms* 

BPRS 10.83 (2.56) 

SAPS 13 (5.66) 

SANS 21.83 (9.06) 


*Premorbid intelligence was assessed using the North American Adult Reading Test [22]. "Chlorpromazine 
equivalent (mg/day; [23]). “Semi-structured clinical interviews assessing symptoms over the past month. 
Brief Psychiatric Rating Scale (BPRS; [24]); Scale for the Assessment of Positive Symptoms (SAPS; [15]); 
and the Scale for the Assessment of Negative Symptoms (SANS; [3]). 


The control group was recruited from the local community. The IQ measures were 
used to potentially screen for intellectual competency to complete the tasks. 


3.3. Tasks 


The VR-based system presented a total of 20 trials corresponding to the 5 emotional 
expressions with each expression having 4 levels. Each trial was 12-15 s long. In each 
trial, first, the character narrated a context story that was linked to the emotional ex- 
pression that followed for the next 5 seconds. The avatar exhibited a neutral emotion- 
al face during story telling. The IAPS picture was presented in such a way that each 
category was presented as a block and rating was performed after each category re- 
sulting in a total of 6 trials of 10 s for each picture in the category whereas ratings in 
the VR systems was after each trial of emotion expressions. Therefore, each IAPS 
trial consisted of four pictures from the same category. It has to be noted that all the 
four pictures in a category were selected carefully for equivalence as far as eliciting 
equivalent emotional responses were concerned. A typical laboratory visit was ap- 
proximately one hour and 30 minutes long. During the first 15 minutes, a trained the- 
rapist prepared the subject for the experiment by placing the physiological sensors on 
the participant. Before the task began, the eye tracker was calibrated. The calibration 
was a fast 9 points calibration that took about 10-15 s. At the start of each task, a wel- 
come screen greeted the subject and described what was about to happen and how the 
subject was to interact with the system. Immediately after the welcome screen, the 
trials started. At the end of each trial, questionnaires popped up asking the subject 
what emotion he/she thought the avatar displayed and how confident he/she was in 
his/her choice in the VR system. The questionnaires for the IAPS pictures asked the 
level of arousal and valence of the emotion they felt together with the emotion they 
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felt by watching the pictures. The emotional expression presentations were rando- 
mized for each subject across trials to avoid ordering effects. To avoid other 
confounding factors arising from the context stories, the stories were recorded with a 
monotonous tone and there was no specific facial expression displayed by the avatars 
during that context period. 


4 Results and Discussions 


We have compared similarities and differences of physiological and eye behavioral 
responses using physiological and eye behavioral and physiological features. The 
physiological and eye tracking data were processed to extract five features each to 
compare elicited responses during the facial emotional recognition tasks in the dy- 
namic virtual environment as compared to the static IAPS presentation system. Re- 
sults indicate that there are differences in both physiological and eye tracking indices 
between the patient and the control group. We categorized the trials into three groups: 
negative, positive, and neutral in the IAPS study and into two groups: positive and 
negative in the VR study for both physiological and eye tracking data. In the IAPS 
pictures presentation, the 6 trials were categorized into three groups by combining the 
social and non-social stimuli together whereas in the VR presentation, the prominent 
positive Goy and surprise) and negative emotions (anger and disgust) were combined 
with high and extreme levels of arousal. We also extracted baseline features for the 
physiological data to compare the responses in these categories to note whether they 
were above or below the baseline values. 


4.1 Physiological Features Comparison 


Table 2. IAPS Pictures Session Physiological Features 


Positive Category Negative Category 
Patients Controls Patients Controls 
Mean SD Mean SD Mean SD Mean SD 
HR (bpm) 85.24 9.27 82.92 10.76 84.30 15.75 86.15 11.33 
SKT (F) 85.44 11.72 93.09 3.05 85.70 11.84 93.11 3.14 
BR (bpm) 16.60 6.11 16.62 3.13 18.28 9.01 15.31 4.56 
SCL (uS)* 5.21 3.33 8.84 4.21 5.19 3.17 8.79 4.30 
SCRrate 2.61 1.38 2.12 0.96 2.13 1.29 2.38 1.14 


*0<0.05 


As shown in Table 2, the patient group had higher emotional response indicators in- 
cluding higher heart and skin conductance response rates and comparable breathing 
rate when presented with positive emotional pictures than negative emotional pic- 
tures. However, only the tonic skin conductance rate was statistically significantly 
different in both the positive and negative emotional categories. 
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Table 3. VR Session Physiological Features 


Positive Category Negative Category 
Patients Controls Patients Controls 
Mean SD Mean SD Mean SD Mean SD 
HR (bpm) 85.53 13.23 80.80 15.04 83.27 10.39 83.10 14.36 
SKT (F)* 86.84 11.28 94.29 2.35 86.78 11.15 94.22 2.43 
BR (bpm) 17.25 12.77 17.65 7.04 23.18 15.95 21.35 15.07 
SCL (uS)* 4.76 2.84 7.94 4.05 4.81 2.91 7.75 4.02 
SCRrate 2.65 2.89 3.18 2.73 2.72 2.73 2.68 2.70 


*0<0.05 


Table 3 shows that the patient group showed similar differences from the control 
group as in the case of the IAPS static pictures presentation. However, in the VR case, 
both SKT and SCL were statistically significantly different in both the positive and 
negative emotion categories. 

The baseline conditions were: 1) for the patient group: mean (SD), HR: 82.24 
(10.76) bpm, SKT: 86.59 (10.93) °F, BR: 28.87 (16.66) bpm, SCL: 4.23 (2.41) uS, 
and SCRrate: 2 (0.61); and 2) for the control group: HR: 79.18 (11.72) bpm, SKT: 
94.64 (1.55) °F, BR: 16.56 (3.53) bpm, SCL: 7.79 (3.55) wS, and SCRrate: 2.1 (0.65). 
Note that BR for the patient group decreased from the baseline case in almost all con- 
ditions whereas it increased for the control group. Another observation is that the 
patient group had more responses in the positive category than the negative ones, 
which agrees with the existing literature that people with schizophrenia report in- 
creased response to positive emotional facial pictures than negative emotional facial 
pictures as compared to control non-psychotic people. 


4.2 Eye Tracking Indices Comparison 


Most of the eye gaze indices showed statistically significant differences between the 
two groups (Table 4). The patient group showed more saccadic eye movement as 
indicated by higher SFC, lower FD and shorter SPL. These indices are known to cor- 
relate with one’s engagement. Therefore, the patients were less engaged than the con- 
trol group. 


Table 4. IAPS Pictures Session Eye Features 


Positive Category Negative Category 
Patients Controls Patients Controls 
Mean SD Mean SD Mean SD Mean SD 
PD (mm) 2.66 0.37 2.91 0.24 2.68 0.36 2.91 0.19 
FD (ms)* 203.91 177.04 453.72 176.03 170.91 103.27 371.00 146.84 
SFC* 150.67 56.56 92.00 35.41 172.58 69.86 101.67 41.99 
SPL (pix)* 67.24 24.56 139.00 46.73 92.16 32.35 147.63 54.69 
BR (bpm) 11.25 3.51 10.83 5.00 10.58 4.70 10.08 4.94 


*p<0.05 
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Similar pattern of less engagement was observed in the patient group than the control 
group during the VR presentation (Table 5). Only exception in this case was pupil 
diameter, which was statistically significantly different with the patient group having 
lower PD. Pupil constriction was associated with engagement. The blink rate was 
statistically different for the negative category of emotions. 


Table 5. VR Pictures Session Eye Features 


Positive Category Negative Category 
Patients Controls Patients Controls 
Mean SD Mean SD Mean SD Mean SD 
PD (mm)* 2.61 0.39 2.91 0.20 2.62 0.40 2.95 0.18 
FD (ms)* 123.38 125.71 439.67 345.32 134.00 118.98 450.78 333.50 
SFC* 56.65 28.08 35.75 28.36 57.35 24.98 37.20 30.04 
SPL (pix)* 48.43 30.12 101.04 43.29 52.48 35.14 88.74 38.93 
BR (bpm) 2.10 1.09 3.10 1.55 2.70 1.65 2.70 1.45 


*p<0.05 


5 Conclusion and Future Works 


Both the IAPS and the VR systems were able to present the facial emotional expres- 
sion trials successfully. Eye tracking and various physiological signals were collected 
and analyzed offline. The results from gaze and physiological feature level analysis 
show that they are viable indicators of internal emotional states of patients with SZ 
although their self-reporting can be biased by their emotion processing and under- 
standing impairments. The patient group overall responded slightly stronger in the 
positive emotion presentations than both the negative and neutral (baseline, in the 
case of VR) emotion conditions for almost all the features. This preliminary study 
could inform future adaptive VR applications for SZ therapy that could harness the 
inherent processing pattern of patients with SZ as captured from their gaze and body 
physiological signals. Such implicit mode of interaction is advantageous over perfor- 
mance-only interactions for objective, extensive, and natural interaction with the vir- 
tual social avatars. Despite several limitations related to the design of the emotional 
expressions in the VR system and limited interactivity in the current system, this 
initial study demonstrates the value of future adaptive VR-based SZ intervention sys- 
tems. For example, the ability to subtly adjusting emotional expressions of the ava- 
tars, integrating this platform into more relevant social paradigms, and embedding 
online physiological and gaze data to guide interactions to understand psychological 
states of patients with SZ could be quite useful tools. We believe such capabilities 
will enable more adaptive, individualized and autonomic therapeutic systems in the 
long run. 
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Abstract. This paper presents a cognitive training based on a brain— 
computer interface (BCI) that was developed for an adult subject with 
an attention disorder. According to the neurofeedback methodology, the 
user processes in real time his own electrical brain activity, which is de- 
tected through a non-invasive EEG device. The subject was trained in 
actively self modulating his own electrical patterns within a play ther- 
apy by using a reward—based virtual environment. Moreover, a consumer 
easy-to-use EEG headset was used, in order to assess its suitability for 
a concrete clinical application. At the end of the training, the patient 
obtained a significant improvement in attention. 


Keywords: Play therapy, Attention training, Rehabilitation, Brain— 
computer interface (BCI), Neurofeedback. 


1 Introduction 


In the last decades the development of new human-computer interaction tech- 
nologies made possible to directly interface the human brain with digital devices 
in order to control them just using our thoughts. The brain-computer interface 
(BCI) through electroencephalography (EEG) arouse the attention of the scien- 
tific community thanks to its last improvements in terms of performance and 
applications [27] [7]. These cover a wide range of areas such as entertainment 
(e.g. video games) [18], military enhancement and assistive technologies [15]. 

One of the most interesting area of investigation concerns clinical rehabili- 
tation of physical and cognitive deficits. On one hand it is possible to enhance 
physical capabilities of disable patients with methodologies such as silent speech 
interfaces [17], thought-driven wheelchairs [9] and prosthetic devices [16]. On the 
other hand BCI can be exploited to rehabilitate patients with cognitive deficit. 
Within the neuropsychology field, one of the most successful application deals 
with attention disorders (as for ADHD syndrome [10] {14]|3}). 

This paper presents an innovative and user-friendly way to apply consumer 
BCI technologies and play therapy with virtual reality in the neuropsycholog- 
ical research on attention disorders. Immersive virtual reality (VR) cognitive 
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training has been already confirmed to be effective with behavioural and at- 
tention problems [25]. In the neuropsychological rehabilitation field, previous 
research has generally used game-like training environments in order to increase 
motivation and participation in the patient [23] [24]. In that regard, a good vir- 
tual reality-based rehabilitation has to deal with the usability of the employed 
human-computer interaction technologies. Therefore, within the BCI commu- 
nity, one of the challenges of the last years is to develop more advanced devices 
and experimental methodologies in term of cost for costumers and usability. This 
becomes further important in the rehabilitation field with cognitive or physical 
disabled patients, who typically have more difficulties to be comfortable with 
normal EEGs. 


Fig. 1. The Emotiv EPOC and the electrodes location 


In recent times, simplified consumer BCI EEG headsets were introduced, such 
as Emotiv EPOC [26] and NeuroSky [27]. So far, the research community still 
wonders about the accuracy and suitability of consumer BCI electronics in clin- 
ical environments [I]. However, although less clear and strong, Emotiv EPOC’s 
recording accuracy has been already assessed within the literature as having 
reasonable quality compared to a medical grade device [8]. The Emotiv EPOC 
device, makes possible to simplify the equipment set up, avoiding the practical 
difficulties related to the EEG operations, such as skin abrasion or the applica- 
tion of conductive gel on the subject, which represents a particularly valuable 
advantage in attention disorder rehabilitation with restless patients. Our study 
confirm that is possible to use this type of headset in a clinical context, where 
the usability of the device (i.e. wireless connectivity, saline solution instead of 
gel, fixed arrangement of electrodes) can positively influence the compliance of 
the subject. 

Based on neurofeedback methodology, we performed a cognitive training on an 
adult subject suffering from a frontal syndrome. In line with this approach, the 
user was confronted in real time with his own electrical brain activity: by using 
a reward-based virtual environment, we trained the subject with a video game 
to actively self modulate his own electrical patterns. In line with this approach, 
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this study aims at further reduce problems of compliance and familiarity also 
with clinical equipments. Moreover the procedure has been embedded within a 
game-like environment to challenge the patient. Such a methodology aims to 
develop a training in which the subject is more motivated and involved than in 
a typical clinical context. 

The first step of the cognitive training was to record specific electrical patterns 
with the Emotiv EPOC. These were used as input commands in the video game. 
The participant was then asked by the game to repeatedly recall various and 
specific patterns corresponding to different movements of an object in a 3D 
space with levels of increasing difficulty. Each correct move leads to a positive 
reinforcement stimulus appearance. In this case a slightly erotic kind of reward 
was chosen, since the frontal syndrome of the patient was characterized by a 
sexual disinhibition. During and after the training the subject’s attention deficit 
has been assessed with three different neuropsychological tests (i.e. Posner, CPT— 
II, d2). It will be shown that with this combination of new BCI technology and 
play therapy one can obtain significant results: at the end of the training of this 
case study the subject was able to improve his attention skills. 


2 Methods 


2.1 Experimental Design 


The method used in this study is an experimental protocol within the subject, 
a manipulated variable on and off. The experiment is in the alternation of two 
types of phases: a training phase with neurofeedback (A) and a resting phase 
(B) not subjected to any kind of experimental stimulus. These two phases are 
repeated twice in alternation and each have the duration of one month. The 
two training phases (A) are composed of five meetings of one-hour training. 
The cognitive performance of the subject is assessed at the beginning and the 
end of each phase, through the same neuropsychological tests. We expect to find 
significant performance improvements at the end of each experimental phase and 
no significant changes at the end of each resting phase. 


2.2 Subject 


The participant of this single case study is G.F. (male, age 36). In October 
2003, due to a car accident, suffers a head injury. As a result he suffers from 
a frontal syndrome with character of medium-high severity, with outcome of 
cognitive and behavioural disorders: regarding to the cognitive profile, the pre- 
vious neuropsychological assessments identify a "damage to the frontal lobes 
with impairments charged to attention and concentration, the ability to support 
a cognitive activity over time and switch from one line of thought to another ”. 
On the behavioural level the loss of spontaneous initiative (apathy), a depressive 
mood with a tendency to restlessness, irritability and aggression, and also the 
lack of awareness of his own cognitive disorders and sexual disinhibition were 
diagnosed as the symptom of his syndrome. 

The subject has no prior experience with BCI and neurofeedback. 
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2.3. BCI Device and Software 


For the signal acqusition an EEG recording device produced by Emotiv Sys- 
tems is used: Emotiv EPOC.The device uses a set of electrodes placed with a 
fixed arrangement and localized on the International 10-20 System [12], with 
14 channels (with CMS/DRL references in P3/P4 locations, see Figure[I). The 
sampling frequency is 128Hz. The EPOC filter is set from 0.2Hz to 43Hz. The 
application of the sensors is easy and requires few minutes: it is sufficient to wet 
with a saline solution small sponges that allow the passage of the electric signal 
on the scalp to the EEG electrodes (without any use of electro—conductive paste 
or abrasion of the scalp). 

The computer acquires the EEG signal directly via wireless from the EPOC 
device. Processing occurs online through the Software Development Kit (SDK) 
of EPOC and is communicated to a graphic user interface developed for the 
experiment. This interface was develop by using the OpenGL library and the 
C++ language on a Windows XP machine with Visual Studio 2010 Express and 
displayed during the training on a 21-inch LCD monitor. 


2.4 Task Structure 


During the training the participant is requested to repeatedly recall and produce 
various and specific electrical patterns. These are used as input commands for 
the task. The feedback consists of two components: the corresponding movement 
of a cube in a 3D space and the appearance of a positive reinforcement stimulus. 

The first step is the Recording of the EEG patterns. The subject begins by 
defining a baseline, through a 30 seconds EEG recording in a neutral state. Then, 
for every possible cube’s movements, the corresponding patterns are recorded for 
8 seconds each (e.g., one for UP, one for DOWN, one for LEFT and so on). Once 
these recordings are concluded the participant has organised the commands to 
meet the request of the Test phase. 

The Test is composed by a block of 40 consecutive trials, 15 seconds each 
(Figure 2h). At the beginning of each trial a word at the centre of the screen 
indicates the direction to which the cube has to be moved within the next 15 
seconds interval (Figure I). The subject must recall from time to time the 
pre-recorded pattern associated with the requested movement. The different 
directions requests are randomized and equally distributed within the 40 trials 
block. 

In each trial a red bar on the left side of the screen indicates the power of the 
recalled pattern (Figure2b II). Upon exceeding the 65% intensity of production, 
the appearance of a positive reinforcement visual stimulus fades in (a slightly 
erotic image) progressively sharper until the 100% intensity (Figure III). On 
the contrary, if the player moves towards the wrong direction, the reinforcement 
will not be shown (Figure [2b IV). This type of stimulus was chosen considering 
the sexual disinhibition of the subject. 
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Fig. 2. a) Task structure. b) Single trial. 


2.5 Training Procedure 


The two experimental phases (A) consist of five training meetings distributed 
with rate of once or twice a week during a month in a laboratory of the Depart- 
ment of General Psychology, Padua, Italy. 

At the beginning of the whole training GF was told to think of distinct men- 
tal states, easy to recall, and that these thoughts would have been translated 
into electrical patterns detected by the EEG headset as commands for the cube 
movement in the game. 

During each meeting, after 10 minutes of practice to become familiar with 
the task, the participant begins the training: two sessions composed each by 
a Recording and a Test phase. A short break separates the two equal sessions 
to give to the subject a time of recovery after the attention effort. The entire 
cognitive training is characterised by an increasing difficulty in the requests asked 
to the participant and the game is organised and divided into different levels. In 
the first level the player has to perform actively one movement with the cube 
in all the trials (e.g. UP); in the second level two movements are requested (e.g. 
UP and DOWN) randomly distributed in the trials block, and so on for the next 
levels increasing the number of movements. 

To unlock the access to the next level, the player must reach the 95% accuracy 
rate of the requested movements, crossing the threshold of 65% of intensity indi- 
cated by the reward stimulus fading in, in both Test sessions of a meeting. This 
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criterion was set in order to be sure that once the level has been completed, the 
movement-skill was learned completely before adding another one to the next 
level. With this procedure the participant faces a sustained attention task from 
the very first level. In the later levels of the training the selective component of 
attention is also requested by switching between two or more movements. After 
the resting phase (B) of the entire cognitive training, in the subsequent training 
phase (A), the subject will start the game again from level 1. 


2.6 Neuropsychological Tests 


For the attention assessment an adaptation of the Posner’s spatial cueing task 
[19], the d2 test 2] and the CPT-II [5] are used. 

In this study, a computerized test on Posner’s paradigm was chosen to assess 
mainly the intensive component of attention through the precise detection of 
the parameters of response accuracy (ACC) and reaction times (RT). The trials, 
divided into 8 blocks of 48 trials each, follow one another with a variable time 
between 50 ms and 150 ms and the time between the cue and the target (Stimulus 
Onset Asynchrony, SOA) can be 200 ms or 800 ms. The test has a total duration 
of 30 minutes. 

The Continuous Performance Test consists of a visual test performed on the 
computer with an odd—ball paradigm. This test is used for the assessment of at- 
tention and vigilance, detection of the signal and the automatic response inhibi- 
tion ability [4]. On this occasion Conners’ version of this test is used (CCPT-II). 

The d2 test is a barrage test characterized by the simultaneous presentation 
of visually similar stimuli. This test is presented as a standardized measurement 
method particularly accurate to detect individual abilities of selective attention 
and concentration [2]. 

The tests were administered at the beginning and end of every Training (A) 
and Rest (B) phases, at a distance of one month, for a total of five measurements 
taken at time 11, ¢2, t3, t4, 5, corresponding to the start of the experiment, the 
first training’s end, the first rest’s end, the second training’s end and the second 
rest’s end respectively. 


3 Results 


3.1 Posner 


In the results analysis of the test, the values of Accuracy (Acc) and Reaction 
Time (RT) are considered. The obtained values were analyzed using a paired 
samples t-test, comparing the performances recorded after the different phases 
(Figure B). As a result of the training sessions (A) significant improvements 
were found. Regarding the Accuracy parameter, the t-test shows a significant 
difference between the beginning and the end of the first phase (A) of cognitive 
training (¢(6) = —9128, p < 0.001); the analysis shows also a significant reduction 
of Reaction Times (RT) as a result of the first experimental session (t(6) = 
42.965, p < 0.001) and the second one (t(6) = 8.916, p < 0.001). Following 
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the first rest phase (B), the t-test shows no significant differences compared to 
previous assessments in both parameters Accuracy and Reaction Times, while 
following the second rest phase the t-test presents a significant difference for both 
parameters (Acc: t4 — t5: t(6) = 2.661, p =< 0.05; TR: t4 — t5: t(6) = —4, 676, 
p =< 0.05). 


Accuracy Reaction times 
120% 850 
110% Traning 
B_ | Rest 
100% £0 
ty * 
90% 5 
2 * S 550 
* 80% ce 
= 
“oe 400 < 
60% * 
50% 250 
tt 12 t3 t4 tS tt t2 t3 t4 tS 
time time 
Fig. 3. Posner test results 
3.2. CPT-II 


The five assessments reveal a trend similar to the one detected by the Posner’s 
test. The performance progression is analyzed looking at different parameters 
that are indicative of attention capacity and control of impulsivity: Confidence 
Index, Omissions, Commissions, Reaction Time, Variability of reaction times 
and capacity of Detectability. Except for the Confidence Index, the scores are 
converted to T-scores and the significance of the changes between the different 
performances in each parameter is calculated with the Reliable Change Index 
[17]. Clinically significant changes has been detected in the following parameters 
(see Figure [4). 


Confidence Index: the percentage chance to present an attention disorder, if 
more than 50% is defined clinically at risk. The values show improvements after 
both the training sessions. A significant change after the second training phase 
in comparison with the first assessment has been recorded (t1: 52.4%; t4: 42.3%). 
It starts with a clinical classification of attention deficit in t1 (52.4%) to a non— 
clinical in t2 (49.9%) stable until the end of the study (¢5: 45.5%). 

Regarding the parameters of Commissions (the subject responds to the non- 
target stimulus or responds too slowly); Variability (the degree of constancy of 
the speed of response); Discrimination (the value related to the ability to cor- 
rectly identify the target stimuli): significant improvements are recorded after 
the first training phase, the performance is also assessed as ”mildly atypical” 
(i.e. T-score > 60) in t1 and within the average in ¢2. This significant change, 
as a result of the first training, remains stable and within the average until the 
end of the study. 
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3.3 d2 


The raw scores obtained in the different categories were converted to z-score, 
showing an increase of the values as a result of both the training phases in every 
parameters. Furthermore, the first assessment is almost globally out of average 
(z-score> 2), excluding the parameter of Total characters processed, while the 
last one is characterized by data within the average, with the exception of errors 
of Commission (see Figure [5). 


244 F. Benedetti et al. 


4 Discussion 


In the present study an attention training through neurofeedback has been de- 
veloped. The subject’s attention has been assessed during the different phases 
to measure the evolution of his performance over time in relation to the train- 
ing. The expected step trend was recorded in the three tests: the results show 
significant improvements after the two training phases and a general enhanced 
performance at the end of the study. 

The attention improvements are the results of the effort in self modulating 
the EEG patterns requested within the structured neurofeedback training. This 
subtended different higher cognitive abilities such as the strategic process to 
recall these patterns quickly and precisely; the ability to self control and regulate 
one’s behaviour; the sustained and selective attention requested in the training. 

In the last decades, the research on neurofeedback has shown significant results 
in terms of rehabilitation for attention disorders (e.g. ADHD [10] |14][3]). Game— 
like trainings have been used to increase participation and motivation in the 
subjects. The key of the training was in fact to elicit motivation for the patient to 
stay focused and challenge himself further, since attention deficit and behavioural 
issues can be considerable opponents in a cognitive training. 

Virtual reality—based play therapy appeared to provide a solution to this chal- 
lenge [24] |28][29]. By choosing a VR game, we aimed to develop an intervention 
tool that would have been challenging and appealing for the user. The game was 
designed by following a set of guidelines already assessed to be effective for cog- 
nitive rehabilitation [23] [30]. We chose to develop a game that could feed—back 
the user with immediate rewards based on performance, in this case a slightly 
erotic kind of reward was chosen to increase the appeal of the therapy since 
the frontal syndrome was characterized by a sexual disinhibition. The game also 
provided the patient with quantitative performance data: a gauge representing 
his ability and precise rules to overcome the levels. In this way the patient was 
leaded to actively and responsibly engage in his own cure by self evaluating his 
performance online. The levels have been structured by trying to determine the 
right challenge to make the game fun (flow), with the purpose of gradually raise 
the complexity of the task and the requested attention effort. 

Moreover, the evolution towards economical and easy—to—use headsets can be 
considered an essential step to achieve a new generation of user-friendly BCI 
training equipment [20]. In this study, with the use of the Emotiv EPOC device, 
was possible to simplify the equipment set up, avoiding the practical difficulties 
related to the EEG operations such as skin abrasion and adding conductive gel, 
especially with our type of restless patient. The simple usability of this head- 
set (ie. wireless connectivity, saline solution instead of gel, fixed arrangement 
of electrodes, etc.) influenced positively the compliance of the subject. So if 
on one hand the Emotiv’s recording accuracy has been already assessed within 
the literature as having ” reasonable quality compared to a medical grade device” 
[22], on the other, regarding its suitability in clinical environments, the results 
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of this study show the important potential of using this kind of device for concrete 
clinical applications. Not only significant positive results were obtained regarding 
the subject’s attention deficit, but we could also confirm that similar technologies 
facilitate the creation of user-friendly training environments, and hence can 
improve the compliance rate of subjects. In our opinion, similar devices enable 
to develop a training and a play therapy in which the subject is more motivated 
and involved than in a typical clinical context. 


5 Conclusion 


In this paper a virtual reality—based play therapy with neurofeedback was used 
for a patient with an attention disorder. As seen in previous experiments and 
assessed in this single case study, the effort in self modulating one’s electric 
patterns into a BCI has significant positive implications for attention disorder. In 
a Clinical setting, create a user-centered training with an easy—fitting procedure 
for the patient can also be crucial. When opting for the experimental procedure, 
the right EEG device and reinforcements, our challenge was to find a trade-off 
between user’s motivation and goals, and his health—mental state. Due to the 
patient’s attention deficit and restlessness, we shifted the focus of our training 
on usability and appeal, in order to let the patient concentrate on the task 
without any environmental or technical distractions related to the EEG device, 
and trying at the same time to make his effort as pleasant as possible. A non 
invasive EEG device was used, since our priority was to develop a comfortable 
training system. Our experiment allowed us to eliminate complex procedures, 
which were deemed not feasible for this kind of patient. Moreover, play therapy 
has been an effective answer for the patient’s motivation problem. Starting from 
the results of this case study, our aim is to extend the same procedure to a higher 
number of subjects, in order to confirm further our results. 

In recent years the effectiveness of BCI therapy has been confirmed and the 
related technologies have become more commercially accessible and usable. How- 
ever, it is impossible until now to carry out the neurofeedback training without 
the assistance of a therapist. The innovative aspect of this new kind of consumer 
equipment (provided with a well designed training and an adequate reinforce- 
ments) is that the patient should be able to eventually undertake his training 
independently. One could also think to promote a domestic therapy with home 
exercises and training programs (telerehabilitation), easy-to-use for the patient 
or for his caregiver. Therefore, the challenge for us is to further develop EEG de- 
vices enhanced in lightness and precision; create appealing and adequate training 
software; deepen the study of BCI and neurofeedback method for a more effec- 
tive learning of the interface from the user. In order to achieve this much, a 
partnership is needed between engineering, computer science, neuroscience and 
psychology, through which virtual realities and related technologies can be better 
applied to healthcare and rehabilitation. 
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Abstract. Mirror therapy is used from many years to treat phantom limb pain in 
amputees. However, this approach presents several limitations that could be 
overcome using the possibilities of new technologies. In this paper we present a 
novel approach based on augmented reality, 3D tracking and 3D modeling to 
enhance the capabilities of the classic mirror therapy. The system was con- 
ceived to be integrated in a three steps treatment called “Graded motor im- 
agery” that includes: limb laterality recognition, motor imagery and, finally, 
mirror therapy. Aiming at a future home care therapy, we chose to work with 
low-cost technologies studying their advantages and drawbacks. 

In this paper, we present the conception and a first qualitative evaluation of 
the developed system. 


Keywords: Augmented Reality, 3D tracking, 3D modeling, phantom limb pain 
treatment, mirror therapy. 


1 Introduction 


In this paper we introduce a system based on augmented reality for the treatment of 
the phantom limb pain. The expression “phantom limb” describes the sensation of 
abnormal persistence of a member after an amputation or after that it became unres- 
ponsive due to some others reasons (as a stroke). Even if people suffering from this 
phenomenon are aware that this feeling is not real, usually they experience painful 
sensations in their amputated limb known as “phantom limb pain’. The reason for 
these symptoms is not entirely clear and several theories coexist trying to explain the 
mechanisms underlying this syndrome [1]. 

To appreciate the importance of the phenomenon, in statistical terms 90-98% of 
people after an amputation report experiencing a sensation of phantom limb, about 
85% of cases are accompanied by uncomfortable or painful sensations, physical limi- 
tation and disability. In 70% of cases, the phantom sensation is painful even 25 years 
after the loss of a limb [2]. 

The main treatment methods described in the literature for phantom limb pain are 
mirror therapy, motor imagery and graded motor imagery. All these treatments would 
recreate a correct cerebral representation of the missing limb for reducing phantom 
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limb pain. In this paper, we focus on the mirror therapy. The mirror therapy was in- 
vented by V. S. Vilayanur Ramachandran [3] to help relieve phantom limb pain, in 
which patients can “feel” they still have the lost limb. In particular, the patient hides 
the stump behind a mirror (see Fig. 1) and, using the reflection of the good limb, the 
mirror creates the illusion that both limbs are present. The illusion persists while the 
patient tries to perform symmetric movements. Several experiments [4, 5] have shown 
that the mirror approach contributed to reduce the phantom limb pain, even if, cur- 
rently, there is no general consensus regarding the real effectiveness of the mirror 
therapy [6]. 


Fig. 1. Example of use of the mirror box by a healthy person. We tested the mirror therapy in 
order to get a better understanding of its limitations. 


Starting from these assumptions, the goal of this project is to exploit the capabilities 
of the new technologies to develop an “augmented reality mirror therapy” capable of 
increasing the immersion and the engagement of the patient while removing some 
constraints related to the classic mirror therapy (i.e., restrained patient’s movements, 
limited number of exercises, etc.). We want to study the feasibility of integrating an 
“augmented reality mirror therapy” within a treatment of occupational therapy for 
patients that suffered a lower limb amputation. 

Using augmented reality (AR) to improve the classic mirror offers several advan- 
tages. First of all, AR makes possible for the patient to make more varied movements 
or even actions impossible to perform with a simple mirror such as movements that 
pass the center of the body (otherwise limited by the mirror), interaction with virtual 
objects to play games or perform more or less complex exercises. These new possibil- 
ities could allow enhancing the participation of the patient to the therapy presenting 
more entertaining scenarios. Then, the scenarios can be adapted to the different pa- 
tients’ needs or interest, for instance going in the direction of gamification for young- 
er patients or providing more guidance to patients that need it. Furthermore, the 
therapist will be able to choose the more appropriate exercise scenario in relation to 
the physical possibilities of the patient, which can be extremely different from person 
to person, depending on various factors such as age, amputation type, etc. 
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2 Background and Related Work 


Many works tried to improve the classic mirror therapy using approaches based on 
virtual reality (VR) or AR aiming at providing a more immersive and interactive ex- 
perience for the patient. 

Murray et al. [7, 8] analyzed the use of VR as a treatment for the phantom limb 
pain. The authors presented a test protocol focused on the quantification of the pain 
perceived by the participants before and after the sessions with the mirror box in VR. 
Three actual cases were analyzed for a period of three weeks and several sessions. 
The three participants expressed a decrease in pain in at least one of the sessions. 

Two systems for the hand movement rehabilitation based on VR and AR were 
compared in [9]. The study showed that the AR approach provided better results, 
especially in terms of realism of the simulation. 

Desmond et al. [10] presented a mirror therapy approach based on AR and tested it 

with three patients comparing the results with the classic mirror box. Instead of using 
a head-mounted display (HMD) for the AR, they used a simple screen with a conse- 
quent loss in terms of immersion. They observed similar results from the two ap- 
proaches with the exception of a rather vivid sensations experienced by patients when 
the AR was used to display unexpected or abnormal movements. 
In [11], the authors developed an AR prototype consisting of a Head-Mounted Dis- 
play (HMD) and a stereo camera system. This system allowed recording images of 
the healthy patient’s hand, processing the images in real-time to create a reproduction 
of the missing hand, and finally displaying the virtual hand at the place of the missing 
one. Unfortunately, the authors did not present any study concerning the use of their 
system with patients. 


3 Methods 


Similarly to [11], our work aims to develop an AR system using a HMD to improve 
the immersion of the classic mirror therapy. However, our approach aims to extend 
previous works under several aspects that will be highlighted in this section. First of 
all, we focused on the treatment of patients with amputations in the lower limbs. We 
chose to move in this direction because of the high incidence of patients with an am- 
putation at a lower limb (that statistically represents the great majority [12]) and also 
because most of the previous works focused only on the upper limbs. However, our 
approach can be easily extended to track and modeling the patient’s arms. 


Due to the growth of the life expectancy, in the next years the need of medical at- 
tention will be larger and larger putting “Home Care” in a role of primary importance. 
For this reason we chose to adopt low-cost technologies available on the market fol- 
lowing the idea of possibly bringing in the future the therapy directly in the patients’ 
homes. However, this first study will be held in a hospital, directly under the supervi- 
sion of occupational therapists. After analyzing several options, we chose the follow- 
ing devices (Fig. 2): 
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e Microsoft Kinect for the tracking of the present limb and to animate the 3D model 
of the missing limb. 

e NaturalPoint TrackIR 5 with TrackClip PRO for the head-tracking. 

e Vuzix Warp 920AR for the visualization. 


Fig. 2. The devices used in the system: (from left) Microsoft Kinect, Vuzix Warp 920AR and 
NaturalPoint TrackIR 5 


In order to conceive exercises as useful as possible for the patient, we designed the 
exercises with the aid of occupational therapists taking inspiration from the exercises 
that they usually perform with amputated patients. 

Finally, our system will be used and evaluated within a medical research project 
with amputated patients, as part of a therapy including also limb laterality recognition 
tasks and motor imagery (“Graded motor imagery’). 


From a technical point of view, our approach is based on three main pillars: 


Augmented Reality. Aiming at improving the immersivity, the realism and the inte- 
ractivity of the mirror therapy, we chose to create a system in mixed reality in which 
the patient has the possibility to watch his real, healthy leg together with a virtual 
model of the missing leg replacing the stump. Moreover, augmented realty provides 
the possibility of integrate exercises with virtual objects that would be impossible 
with the classic system and that could help to motivate the patient to practice rehabili- 
tation exercises. 


3D Tracking. The present limb is continuously tracked in real time, in order to ani- 
mate de virtual model of the missing limb, in particular we use information about 
hips, knees and ankles movements and rotations. Moreover, we track the patient head 
orientation to know continuously the patient point of view and therefore mix consis- 
tently virtual objects and real objects (for instance, to place the virtual limb in the 
right spot in relationship to the amputation and the patient’s point of view). 


3D Modeling. A virtual model of the missing limb is reconstructed using information 
of the present limb. For instance, we used parameter such as calf diameter, leg length 
and skin color to create a realistic 3D model of the missing leg. In our case, the skin 
color assumes a particular relevance since the exercises are often performed with a 
naked leg. Moreover, we added physical constraints to avoid abnormal movements of 
the model when the tracking data are noisy or imprecise. We developed four legs 
models to take into account amputations at the hips or knees level (see Fig. 3). 
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Fig. 3. The four leg models (without the skin texture) 


The next section will present the use case of the application. 


4 Use Case Scenario 


The research project that included the development of the prototype presented in this 
paper proposes an occupational therapy session composed of three steps (“Graded 
motor imagery’): limb laterality recognition tasks, motor imagery and, finally, mirror 
therapy with augmented reality. 

The first step “Limb laterality recognition” involves having the patient correct- 
ly identify pictures of right and left hands/legs in various positions. The second 
step, “Motor imagery”, involves asking the patient to mentally represent move- 
ment with amputated leg. The whole process is important for the patient’s rehabili- 
tation; however, since this paper focuses mainly on the conception and development 
of a prototype for an AR mirror therapy, the scenario presented in this section will 
focus on this latter step. 

The therapy will takes place over several sessions. The first session requires an ad- 
ditional step to getting started with the system so, also in the home care scenario, the 
first session will be held in a hospital under the supervision of occupational therapist. 
During the first session the system will record the patient data: the patient sits in front 
of a camera in a well-defined position, in a controlled environment (i.e., determined 
room illumination, uniform background color). Given the distance of the patient from 
the camera and the camera parameters, we are able to automatically measure the leg’s 
parameters such as the legs’ dimensions (e.g., calf diameter, length of the thigh, etc.) 
and the skin color (Fig. 4). 

These parameters are stored along other patient’s personal information (such as 
age, type of amputation, etc.) and then assigned to the 3D leg model in order to match 
the characteristics of the present limb and the amputation level. This setup phase is 
needed only the first time for a new patient. Starting from the second session, the data 
related to a particular patient can be simply reloaded into the system. In the case of an 
important change on the color of the patient skin (for instance due to a new, intense 
tanning) a new model can be created. 
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Fig. 4. Example of picture used to calculate the leg's parameters (left) to assign to the 3D leg 
model (right) 


The following steps are common to every session, while the first session will be di- 
rected by the therapist, from the second session on the patient will be able to follow 
the therapy autonomously in her/his home. 

Once the leg model is ready (recorded or reloaded), the occupational therapy ses- 
sion can begin: the patient takes place into the exercise area (i.e., inside the Kinect 
and Track IR field of view). The setup is depicted in Fig. 5. 

Initially, a short phase of automatic calibration detects the body position and the 
head orientation. The leg’s 3D model is then visualized attached to the patient body in 
the correct position accordingly with the tracking information provided by Kinect and 
the stored information about the patient’s amputation level. The model is then ani- 
mated accordingly with the movement of the healthy leg. 

Depending on the exercise chosen by the therapist, the virtual limb can perform ei- 
ther symmetric movements or replicate the same movements of the healthy limb. 


TrackIR 5 


Audio Angle 


Left er 


Fig. 5. Example of the setup of the exercise region 
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Finally, the patient can interact with virtual objects present on the scene using the 
virtual limb as well as the real limb (see Fig. 6). The coherence between the user 
perspective and the virtual objects on the scene (virtual leg, objects for the exercise, 
etc.) are constantly assured by tracking the user’s head position and orientation. 

In this first prototype we developed a simple game in which the patient can use 
conjointly the healthy leg and the virtual leg to pick and move a virtual ball (see [13] 
for a complete video demonstration). 


Fig. 6. Screenshot of a healthy user testing the application (from the user point-of-view) 


During this preliminary study, once completed the therapy session, we asked the 
patient to fill a survey about the session with particular focus on the usability of the 
system and the realism of the simulation. 


5 Discussion 


The preliminary tests we performed during this study showed a series of interesting 
points to be analyzed and developed in future works. 

Using commercial device helped us to create an application that would be easy to 
deploy in a user’s home without the need of a long training. The Kinect is basically 
plug and play as well as the Vuzix Warp 920AR. The only problem we encountered 
was the setup of the Track IR for the head tracking. In fact, this device works well just 
in a range going from 61 cm to 152 cm from the user’s head. This obliged us to put 
the Track IR on a support in front of the user, causing a small occlusion on the Kinect 
field of view. Moreover, since our exercises required the user to keep the head tilted 
far forward, we had to put the sensor at the knees level while setting accurately the 
inclination of the Track IR camera in order to detect the head movements in this very 
particular position. The mentioned occlusion issue did not cause many troubles. The 
Kinect tracking algorithm has proven to be robust enough to manage small occlusions 
in space and time. 
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Finally, Kinect and Track IR are sensible to infrared light (both technologies are 
based on emitters and infrared cameras); however, just avoiding placing the patient in 
front of a window prevented any issue related to infrared noise. 

If, on the on hand, using commercial devices allowed us to build a system fairly 
easy to set up, on the other hand, the limitations imposed by these devices are numer- 
ous and need to be discussed. 

Kinect’s precision is good enough to provide a good tracking of the human body 
and the leg movements. However, the tracking of the ankle movements is already less 
precise: the abduction/adduction' movements are fairly tracked, while plantar flex- 
ion/dorsiflexion’ movements are, basically, ignored. In a future application aiming at 
tracking more subtle movements (for instance fingers movements) another sensor or 
technique should be considered. 

The used AR glasses have 31-degree diagonal field of view. This means that the 
user see in front of him a sort of “window on the real world” inside a black frame. 
The field of view offered to the user is good enough to perform mostly of the occupa- 
tional therapy exercises involving legs (as you can see in Fig. 6), the legs are visible 
from the top of the knees). However, a wider field of view could facilitate the immer- 
sion for the user. 

The system provides a fairly realistic representation of the missing limb adapting 
the 3D model to match color and size of the healthy leg. The resolution of the adopted 
HMD (two 640 x 480 LCD displays, 60 Hz progressive scan update rate, 24-bit true 
color) does not allow seeing much more details; for this reason, in this first prototype, 
we ignored important leg’s characteristics like hairiness and muscle mass that proba- 
bly should be taken into account for the higher resolution future versions. 

Finally, there is a short delay between the movement of the present limb and the 
visualization of the movement of the virtual limb due to tracking and image 
processing time. In order to evaluate the impact of the lag on the therapy and how the 
user perceives it, deeper analyses are needed. 

Talking about future possible ameliorations to improve the immersion provided by 
the system several options are open: 

e Adding 3D vision. The HDM used, such as others, has two cameras and 
two screens (one per eye) making possible to provide a 3D vision of the 
real and virtual world. 

e Adding shadows generated by the 3D models. 

e Using sensors detecting muscular activity (such as electromyography) to 
trigger designated animations of the virtual limb overcoming the limit of 
parallel movements. 


6 Conclusion 


In this work we developed a system for the treatment of phantom limb pain based on 
augmented reality, 3D modeling and 3D tracking. We chose to work with commercial 


During the abduction/adduction movement the tip of the foot goes left or right. 
During the plantar flexion movement the tip of the foot goes down, while the dorsiflexion 
involves a movement of the toes upward. 
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devices, aiming to study the limitations of current technologies for a worth consider- 
ing home care treatment of the phantom limb pain. 

Despite the limitations discussed in the previous section, most of them resulting by 
the use of commercial devices, entertaining exercises should help to provide enough 
immersion to compensate some of the previous restrictions. Furthermore, the quick 
evolution of new sensors available on the market might soon close the gap with more 
expensive devices allowing a more accurate tracking of the body/head movements as 
well as a better visualization of augmented reality. 

In this paper we provided also a first qualitative discussion about the capabilities 
and the limitations of such a system. Test with a first limited number of amputee pa- 
tients will be performed in the next months. 
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Abstract. Nowadays, dealing with Alzheimer’s disease (AD) includes a combi- 
nation of pharmaceutical and non-pharmaceutical treatment. But, current drugs 
do not, and potential future drugs might not, improve quality of life. Evidence 
suggests psychosocial interventions, like educational and arts programs, do in 
fact have such a benefit. Supportive and enriching information technology 
may be more important than biotechnology (Whitehouse, 2013). So non- 
pharmaceutical treatment including physical and mental exercising as well seem 
to perform better. There are many forms of mental exercising from simple 
crosswords puzzles to sophisticated video games that exercise different cogni- 
tive skills. Main object of this report is to present the results of a computer- 
based intervention program for people with AD that take place in two Day Care 
Centers of Greek Association of Alzheimer's Disease and Related Disorders in 
Thessaloniki, Greece. There is a significant amount of data that include pa- 
tients, who have taken part in interventions programs since 2009. For the pur- 
pose of this study we included data for a period of one year only. These patients 
have been tested before and after each intervention program (pre-test and post- 
test). Our work was to compare these data to examine how the program per- 
forms and which cognitive skills seem to have better improvement. The results 
showed that patients’ overall scores were preserved for this period of time and 
had a slightly improvement which is a promising result indicating that this in- 
tervention program has positive effects. 


Keywords: computerized cognitive training, Alzheimer’s disease, cognitive re- 
habilitation. 


1 Introduction 


According to recent data, it is expected that the number of elderly people will in- 
crease dramatically. Indeed, it has been suggested that the advancements in the medi- 
cal sciences, in combination with the adoption of a healthy lifestyle can help us live 
longer than before and improve our quality of life. As the human population ages, it is 
more than a necessity to make elderly peoples' life easier, so that they would be able 
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to live on their own, without depending on someone else that would help them per- 
form their everyday activities. However, it is commonly accepted that as an adult 
getting old, his/her brain is also getting old in a way that it gets more and more wea- 
kened as the years go by. A weakened brain could result to reduced cognitive ability 
and performance, and, consequently, the individual might not be able to perform daily 
activities. Even worse still, s/he might not be able to take care him/herself. This is a 
basic factor that characterizes dementia and its most common form, Alzheimer’s 
disease (AD). 

AD is a neurodegenerative disease that progressively destroys brain cells and the 
interconnections between them. As a result, the patient who suffers from this disease 
loses core functions and abilities day by day, presenting symptoms like reduced 
memory capacity, disorientation, and judgment and reasoning declines. Furthermore, 
s/he may also exhibit less self-control, and listening and speaking disorders, such as 
problems in naming objects or other people, text and speech understanding, and re- 
duced visual-spatial perception. During the later stages of the disease the patient may 
lose core abilities and functions and he cannot even live by himself, as he may not be 
able to move, walk, feed, and get dressed. 

Unfortunately, the attempts to find a pharmaceutical treatment of AD have come to 
a dead end, as there is no medicine that can heal the patients and bring them to their 
prior condition. Although, there are some treatment methods that can deal with the 
disease’s symptoms that are available and already implemented and research is ongo- 
ing. The best way to deal with AD is to provide each patient with the appropriate 
medication in order to improve specific biological indexes. However, these medicines 
cannot prevent AD from progressing, but they can decrease the symptoms and slow 
the progression temporarily, improving patients’ quality of life and fostering their 
caregivers. 

Nowadays, there is a tendency all over the world by health associations focusing 
on research for better ways of treatment, which will try to delay AD’s onset and de- 
velopment. It is proven that the best way so far, to treat the disease is the implementa- 
tion of a combination of pharmaceutical treatment and cognitive training (CT), which 
may be remarkably useful and improve mental abilities and brain functionality. Cog- 
nitive training is a term which is described as an intervention that uses properly struc- 
tured exercises to improve, maintain or restore mental function (Valenzuela, 2008.). 
CT can be used in order to limit and offset the cognitive abilities that have been af- 
fected. Another term for CT is “brain fitness” because it is possible to create new 
brain cells and train the brain in order to discover alternative ways to perform func- 
tions that controlled by brain regions which have been damaged. A characteristic 
advantage of CT is that does not demand large amounts of effort from the patients, as 
they are not involved in complex activities, but in contrast, they take part in simple, 
everyday activities familiar to those which already perform. 

The implication of CT can be done in a variety of ways with different tools and 
stimuli but there are specific processes that are fundamental and consist of repeating 
actions that are common in a person’s life and providing appropriate guidance, sup- 
port and help to the patient. Suitable tools which can foster a CT program are 
electronic cognitive exercises or in general computer based CT that can use different 
modalities for such kind of activities. Electronic exercises can be implied usually 
by a PC or a portable device (smartphones, tablets) which are appropriate tools for 
repeating procedures and organize activities, according to each person’s needs. A core 
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principle is the potentiality to provide adapted content, according to each person’s 
mental status, needs, targets and expectations. Also, a CT program should motivate 
and stimulate the user in order to be regularly engaged in realistic situations and activ- 
ities so that he can transfer the acquired knowledge in his real life scenarios. 

An individual, through CT intervention programs, could improve existing core 
functionalities or even develop new, alternative ones which will allow him/her to 
have, if possible, a normal life and a better quality of living. Also, the ability to adapt 
the content according to cultural characteristics or each user’s cognitive status and the 
ease of customization in general is an important and helpful prospect in order to 
create personalized activities. Moreover, computer based applications offer instant 
monitoring and control of every user as well as data collections and metrics of each 
action in the electronic environment, in a way that let us monitor the performance and 
the overall progress, which is also useful for the user to be informed and have the 
right feedback. Feedback is an important aspect because it enhances user’s perfor- 
mance, leads to better results and motivates him to perform better, so that strengthens 
his participation and reduces his disorientation. Finally, electronic applications for CT 
can include enriched multimedia elements such as images, audio and video to make 
more attractive activities that are more pleasant and enjoyable for the users. 

Greek Association of Alzheimer's Disease and Related Disorders (GAADRD) of- 
fers a variety of services for patients and their caregivers including cognitive therapies 
for memory, attention and language enhancement. Besides traditional forms of thera- 
pies, such as cognitive tasks and exercises, cognitive music- therapy there are also 
computerized cognitive exercises for attention and language practice through personal 
computers (PC)s. The main intervention program consists of exercises that focus on 
memory and attention enhancement, each patient works on his own PC and deals with 
several exercises which are specifically designed for memory, logic, verbal, numeric 
and visual-space training that improve the patient’s corresponding cognitive func- 
tions. These exercises demand an important amount of attention, processing speed and 
memory effort and the difficulty is escalading as the patient improves his cognitive 
status. This is a two times per week program. 

In addition to that, there is also a training program that allows patients to exercise 
and familiarize with computers and technology. This program includes educational 
exercises for learning how to operate a PC and the acquisition of basic skills such as 
working on Windows based platforms, using Office’s suite applications (MS Word, 
Excel), surfing on the internet and using e-mail services. This program usually hap- 
pens also two times per week. The main target is to familiarize with a PC, for people 
with no previous experience and to learn new skills using the current technology. 

Each exercise has five levels of difficulty, according to each patient’s mental sta- 
tus, so they are suitable for both low-level and high-level patients and additionally, 
they do not require any previous knowledge of computers or technological education. 
Furthermore, the program takes place in a room with eight PCs, so that every patient 
sits on his own, and all of them have a touchscreen, a feature that lets patients to use 
the computer just by touching in specific spots on the screen. 

Every patient has a record in GAADRD’s database, which uses the OpenClinica’s 
format configured especially for the needs of GAADRD. The database holds records 
about patient’s personal information, demographic characteristics, medical and 
psychological tests and other important information. We have gathered data from 41 
patients who attended a specific computer-based intervention program which was 
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designed according to each patient’s mental status. Some patients started in 2008 and 
still continue to participate in non-pharmaceutical intervention programs. Thus, our 
data are for a period of nearly five years, by we selected data form only a specific 
period of twelve months, in order to have more homogeneous sample. 

In the next sections we present the sample’s profile and patients’ characteristics, 
we describe the method that followed and the intervention program, then the results 
that we gathered and finally, a discussion with comments and conclusions on the 
study. 


2 Method 


Participants. The patients who were selected fulfilled some specific criteria. First of 
all, they were aware of their memory deficits, they didn’t suffer from depression or 
any other psychiatric or neurological disorder and their Mini Mental State Examina- 
tion was equal or above 24. All of them, preserved satisfactory sensory abilities, 
lucked of any speech and language disorders and weren’t on cholinesterase exhibitors. 
It is important to mention that patients’ diagnosis was “Mild Cognitive Impairment” 
(MCI). MCI is a medium condition between normal senescence and dementia. As 
there is normal cognitive functionality decay through the years, it is possible that this 
decay may lead to dementia and MCI is the stage just before dementia appears. 

Each patient was evaluated before and after his participation in the intervention 
program and each program lasted at least for one year. It is mentioned that some of 
the patients were already participated in these interventions, or they are still continue 
to take part. Both pre and post-tests include the same measurements for the following 
neuropsychological tests: 


e Mini-Mental State Examination (MMSE), a short group of tests that used for detec- 
tion of possible mental decline. 

e Clinical Dementia Rating (CDR), a scale used to distinct different phases of de- 
mentia. 

e Functional-Cognitive Assessment Scale (FUCAS), a scale based on personal inter- 
views with the patient. 

e Functional Rating Scale for Symptoms of Dementia (FRSSD), a scale for symp- 
toms of dementia based on interviews with caregivers. 

e Test of Everyday Attention (TEA), which tests the level of attention through three 
different activities. It consists of six individual tests which are TEA 1-A, TEA1-B, 
TEA4-A, TEA4-B and TEA6. 

e Trail Making Test (TMT) Part B, for testing working memory and executive func- 
tionality. 

e Rey-Osterrieth Complex Figure Test (ROCF), which tests visual-spatial memory 
by two individual tests ROCFT1 and ROCFT3. 

e Rey Auditory Verbal Learning Test (RAVLT), which tests verbal learning and 
memory and consists of two tests RAVLT1 and RAVLT2. 

e Rivermead Behavioural Memory Test (RBMT), a scale which tests episodic mem- 
ory by two tests RBMT1 and RBMT2. 

e Verbal Fluency Task, a test for detecting the ease of a patient to produce semantic 
or/and phonological words. 
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Intervention Program. The main intervention program includes a number of memo- 
ry exercises with the use of computers. It takes place in a room with eight Pc's, where 
each patient has in front of him a touchscreen and performs the exercises. It aims at 
both low-level and high-level patients. It doesn't require knowledge of computers. 
There are 5 levels of difficulty in each exercise and the following categories: 


. visual - spatial exercises 
. speech exercises 

. numerical exercises 

. reasonable exercises 

. Memory exercises 


nABWN Re 


The training program is aimed mainly at high-level patients or caregivers. The 
team is a group 6-8 people and it takes place in a class. It is desirable for students to 
possess a PC in order to run the exercises that are given. The modules are: 


1. Usability and familiarity with a PC - Microsoft Windows XP 
2. Word Processor - Microsoft Office Word 2007 

3. Internet use - Internet Explorer 

4. Using accounts - Microsoft Office Excel 2007 


The software used for the interventions is the “Complete Brain Workout” which is 
a commercial product. It has forty cognitive training activities districted in the five 
categories that mentioned before and it can stimulate the brain by exercising the mind, 
improve concentration and memory. Some of the exercises that includes are: Number 
Recall, Stepping Stones, What’s in the Box, Boxes, Linker, Path Finder and other. 
You can find further information in the following link: http://www.oak- 
systems.co.uk/index.php?option=com_content&task=view &id=5 1 &Itemid=9 . 


3 Results 


For the statistical analysis we used IBM’s SPSS 19. We performed descriptive analysis 
and paired T-tests for each variable in Table 2. In Table | there are statistics about the 
age and the years of education of our sample. As we see, the average age is 66 years 
(Std. Dev.=8 years) and the average years of education is 11 years (Std. Dev.=4.5 
years). So we have a well-educated sample and not quite old enough. In Table 2 we see 
the average and Std. Dev. for every test that took a subject before and after the interven- 
tion program. And in Table 3 there are the Paired Sample Statistics that indicate which 
pairs of tests present significant differences in their scores. Overall, we can see that 
there is improvement in all scores of all post-tests. As we can see, according to Sig. 
(2-tailed) index, there are four tests that have significant differences before and after the 
intervention program. These are the RAVLT2, TEA1-A, TEA1-B and VFT. 


Table 1.Descriptive Statistics 


Descriptive Statistics 
N=41 Age Education 
Mean 66,8049 11,4146 
Std. Deviation 8,24384 4,59334 
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Table 2. Means and Std. Dev. for all the tests that performed 


Test Mean Std. Dev. 
preMMSE 28,3171 1,73838 
postMMSE 28,1463 65168 
preFRSSD 2,7073 2,27205 
postFRSSD 3,0000 34164 
preRAVLT1 4,7317 1,89769 
postRAVLT1 5,1707 1,43008 
preRAVLT2 71,7317 4,12931 
postRAVLT2 6,2195 3,11859 
preTEA1-A 26,3415 8,66490 
postTEA1-A 30,0000 9,84886 
preTEA1-B 44,5854 10,94069 
postTEA1-B 48,6341 8,56959 
preTEA4-A 7,1573 2,93934 
postTEA-A 7,5122 2,35714 
preTEA4-B 6,41 4,567 
postTEA4-B 7,64 4,520 
preTEA6 4,52 1,907 
postTEA6 4,13 1,494 
preRBMT1 12,8293 3,49930 
postRBMT1 12,5366 3,43946 
preRBMT2 11,4268 3,73591 
postRBMT2 10,9756 4,18621 
preROCFT1 31,8902 4,21235 
postROCFT1 32,2683 5,45103 
preROCFT3 16,5122 7,35568 
postROCFT3 17,5976 7,28974 
preTRAILB 208,2000 109,74871 
postTRAILB 208,7250 103,66069 
preVFT 11,11 4,237 
postVFT 11,96 4,762 
preFUCAS 43,1707 1,73064 
postFUCAS 42,9756 1,66565 
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Table 3. Paired Samples Test which indicate significant statististical differences 


Test Paired Differences p 


Mean Std. Dev. Lower Upper 


preMMSE - postMMSE ,17073 1,59534 -,33282 67428 497 
preFRSSD - postFRSSD -,29268 2,15921 -,97421 38885 391 
preRAVLT1 - postRAVLT1 -,43902 1,78954 -1,00387 12582 124 
preRAVLT2 - postRAVLT2 1,51220 4,03189 23958 2,78482 021 
preTEA1-A - postTEA1-A -3,65854 8,15049 -6,23115 -1,08592 006 
preTEA1-B - postTEA1-B -4,04878 8,95531 -6,87542 -1,22214 006 
preTEA4-A — postTEA-A -,35488 2,78271 -1,23321 52345 419 
preTEA4-B - postTEA4-B -1,233 6,633 -3,327 ,860 ,241 
preTEA6 - postTEA6 391 1,402 -,051 834 ,081 
preRBMT1 - postRBMT1 29268 2,71794 -,56520 1,15057 494 
preRBMT2 - postRBMT2 45122 3,69257 -,71430 1,61674 439 
preROCFTI - postROCFT1 -,37805 6,17230 -2,32627 1,57017 697 
preROCFT3 - postROCFT3 -1,08537 5,53613 -2,83279 ,66205 217 
preTRAILB - postTRAILB -,52500 108,70638 -35,2909 34,2409 976 
preVFT - postVFT -,850 2,612 -1,675 -,025 4044 
preFUCAS - postFUCAS ,19512 1,22922 -,19287 58311 316 


4 Discussion and Conclusions 


Results showed significant statistical differences in four psychometric tests (namely 
RAVLT2, TEA1-A, TEAI-B and VFT) according to pre and post scores. Concerning 
RAVLT2 test we can see that there is a reduction between the pre and post scores. 
RAVLT2 test examines learning skills, thus we can conclude that after a year of in- 
tervention patients have less learning abilities. Further research has to be done in the 
future, in order to compare this result with a controlled group that will not participate 
in an intervention program. It is important to examine if this reduction is the same 
and/or has the same rate between these groups. The next two tests that have signifi- 
cant statistical differences belong to Test of Everyday Attention (TEAI1-A & TEAI- 
B). Both tests examine the level of attention. In this case, the subjects presented better 
scores in both tests after the intervention. This means that the program improves se- 
lective attention and patient’s ability to stay focused. Considering that the intervention 
program aims to foster attention, we can say that it fulfills this purpose. The last test 
that has significant statistical difference is VFT which examines verbal fluency and 
executive functions. We can conclude that there is improvement in verbal fluency and 
related language functions. This is a very important improvement due to the fact that 
this has also effect on high-level attention abilities. 

Concerning the rest of tests we can conclude that even if there is no significant sta- 
tistical difference, the findings are very encouraging concerning the mental status of 
the subjects. More specifically, it is important to mention that the scores in three tests 
(namely TEA6, FUCAS and TRAILB) have been slightly reduced or they have been 
remain almost the same (namely TRAILB). This is a positive finding, considering that 
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a lower the score is, as better performance is. Furthermore, we can observe that there 
is a reduction in RBMT and RAVLT tests, probably because the intervention program 
is mainly targeted on exercising attention, but further research has to be conducted to 
investigate this fact deeper. 

In conclusion, the majority of tests has been improved after a year of intervention 
and this is a promising result as there was no further progression of impairment. It is 
expected that an MCI patient gets worse as years go by, but results indicate stability 
or improvement, thus we can say that this intervention program has produced positive 
effects in general. 
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Abstract. This paper presents the development and preliminary evalu- 
ation of a Virtual Reality-based system for training in dental anesthesia. 
The development focused the simulation of an anesthesia procedure task. 
The evaluation involved graphic and haptic issues and had the presence 
of experts in the dentistry area. The assessment aimed at attributes that 
may influence the human-computer interaction, hindering realism, an im- 
portant challenge in systems of this type. The attributes selected were: 
the update rate, the appearance of the virtual models and the number 
of viewpoints of the virtual environment, as well as the characteristics of 
the haptic device. Despite constraints were found, in the perception of 
the experts, the system may provide realism and help with the training 
of certain tasks. 


Keywords: dental anesthesia, human-computer interaction, Virtual 
Reality. 


1 Introduction 


Systems based on Virtual Reality (VR) are widely used in the health area, 
especially in the training context, for acquiring of knowledge and sensorimotor 
skills. 

This may be related to the benefits provided by VR, such as: reducing risks 
to patients due to unsuccessful procedures, avoiding damage to health [I]; in- 
creasing the safety of novices, who can practice several times before dealing with 
real patients [2] [8] [4]; automatic evaluations of performance [5] and repeated 
training. 

Moreover, benefits also include levels of training, situations and degrees of 
difficulty [6]; VR may minimize or eliminate costs involved in maintaining phys- 
ical laboratories, which can count on cadavers or animals. Although cadavers 
and animals provide the physical presence, cadavers present physiological dif- 
ferences and animals have divergences in anatomy when compared to human 
beings. Additionally, their use involves ethical issues [i]. 
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Another benefit of VR is flexibility, because mannequins, for example, allow 
physical presence; however, they include limitations in the physiological replica- 
tion and anatomical variation [I]. 

A procedure that does not enjoy such benefits yet is the training for dental 
anesthesia administration, especially the procedure to block the inferior alveolar 
nerve [7]. This procedure presents a high number of failures and the novices 
commonly train in their colleagues [8]. 

Therefore, this paper presents the development and preliminary experiments 
of a VR system for training in dental anesthesia. An important challenge in 
VR systems is the degree of realism, commonly defined by subjective tests with 
experts [9]. The realism in this context is directly influenced by the human- 
computer interaction, including the haptic approach, which plays an important 
role in applications of this type. 

The paper is organized as follows: in Section 2] the main related works are 
described; Sections [3] and [4] present the implementation of the system and the 
experiments, respectively; in Section [] the results, and finally, the conclusions 
in Section [6] 


2 Related Work 


There are a number of simulators for dental procedures {11 [13] [14] 
[15] [16] [77], but just one virtual training system for the purpose related to this 
work, which was not formally assessed [18]. 

However, as the procedure to be simulated involves needle insertion, a set of 
similar training systems can be mentioned. Table [1] presents the main works, 
including the procedure, the region of the body and whether the procedure is 


Table 1. Needle insertion procedures 


Number Procedure Target region MI 
1 Anesthesia Spin No 
2 Biopsy - Yes 
3. Biopsy [22] - No 
4 Biopsy Lumbar No 
5 Biopsy Prostate Yes 
6 Biopsy [25] Thyroid Yes 
vg Brachytherapy Prostate Yes 
8 Catheter - Yes 


9 Chinese acupuncture [28] - 
10 Epidural anesthesia Spinal No 


11 General [37] [32] - Yes 
12 Regional anesthesia [33] Inguinal No 
13 Regional anesthesia - No 


14 = Suturing [35] - 
15 Vertebroplasty [36] Spinal No 
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aided by Medical Images (MI). The term General in the field Procedure means 
that the needle insertion is not restricted to a specific procedure. The hyphen (-) 
indicates that the paper does not present the information or it may be employed 
in several regions. 

Line 8 in Table[I]lists the insertion of catheters, which are procedures initiated 
for the insertion of a kind of needle. Moreover, it is worth noting that there are 
other more complex minimally invasive procedures that start with the insertion 
of certain instruments, as laparoscopy, endoscopy, arthroscopy and endovascular 
procedures [I]. 


3 Development of the System 


The development of the system started with a requirements elicitation conducted 
in collaboration with an dentistry learning institution (School of Dentistry of 
Bauru - University of Sao Paulo). Students and professors of the institution 
provided videos and details about the dental anesthesia procedure, and also 
allowed us to watch real procedures [8]. 

Considering that the simulator is based on VR, the human-computer inter- 
action is a fundamental characteristic. According to [37], the human-computer 
interaction in virtual environment may be classified into the following categories: 
navigation, selection and manipulation, control system and symbolic input. 

Our system considered two categories: 


— Navigation - visualization of the models from several viewpoints using the 
keyboard, allowing the study of the anatomy of the head (bones, blood ves- 
sels, gums, muscles, tongue, teeth and nerves) in various angles of vision; 

— Manipulation - effected through a device, typically a specific devicd}, which 
offers a haptic sensation and captures movements of position and rotation 
[38], being used to modify synthetic models that represent instruments (sy- 
ringes or needles). 


There are several types of haptic feedback, such as [39]: force, tactile, kines- 
thetic and proprioceptive. In this context, the force feedback was adopted due 
the task and characteristics of the device. The force feedback was defined as a 
constant value to the anatomical structures. 

In the second step, the system was implemented using a VR framework [40] 
which allows: (1) - loading 3D (three-dimensional) synthetic models to represent 
anatomical structures and instruments; (2) - specifying properties of the models 
(elasticity, viscosity and thickness); (3) - detecting collision among models; (4) 
- deforming models, modifying their shapes; and (5) - supporting devices for 
interaction, including haptic device and dataglove. Figure[[]allows observing the 
screen of the training system, containing anatomical structures and syringe. 

Due to the complexity of its development, in the third step, the procedure was 
segmented into tasks. The target task is the correct manipulation of the syringe, 
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Fig. 1. Screen of the training system 


encompassing certain parameters: velocity of manipulation, regions reached by 
the needle, movements of translation and rotation of the syringe and duration 
of the injection. 

During the insertion, the needle must be in an angle of about 45 degrees; 
the motions of the syringe must be slow; the needle cannot reach nerves, bones, 
blood vessels and skin; and it must remain in the tissues some minutes before 
extracting, simulating the time for anesthetic injection. Thereby, the user could 
manipulate the syringe using the haptic device and could navigate in the envi- 
ronment using the keyboard. 

In this step, an adaptation of the haptic device also was made. The adaptation 
consisted in replacing the pen with Carpule syringe (Figure 2) [8]. The Carpule 
syringe is the instrument used by dentists in the anesthesia procedure. For the 
tests, different versions of the same system were generated, considering several 
attributes. 


4 Experiments Description 


As the degree of realism is usually defined by experts in the area, to evaluate 
the system, experiments were made with a group of experts of the dentistry 
institution mentioned, comprising 2 teachers and 2 students who had already 
accomplished the procedure. The experiments considered two issues: graphic 
and haptic, which directly influence the human-computer interaction. 

Considering the experts as system users, the hypotheses to be corroborated 
were: (1) users prefer the models with textures; (2) users prefer two viewpoints 
to determine the position and rotation of the syringe, as well as the distance 
between it and anatomical structures; (3) the maximum force provided by device 
is not enough to simulate the needle insertion; (4) the workspace of the device is 
not enough to capture all movements of the insertion task and (5) users prefer 
the device adapted with the real syringe. 

The graphic issue aimed at certain attributes, such as: the appearance of 
the models (wire, color and texture - Figures [3] [4] and 5), and the number of 
viewpoints (Figures [6] and B). 
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Fig. 2. Adaptation of the haptic device replacing the pen with the Carpule syringe 


Fig. 3. Virtual models in wire mode 


The haptic issue dealt with examining the ergonomic attributes of the device 
(shape of the manipulation “pen”, workspace and maximum force). The update 
rate during the human-computer interaction was analyzed with several values 
to verify the minimum value required for the task, being a critical point in the 
haptic issue. The frequency values were decreased according to users’ perception, 
identifying suitable delays in the haptic human-computer interaction. 

In order to collect the users’ preferences questionnaires were used after ex- 
periments. The task to be performed was detailed before experiments, including 
the presentation of the training system. 
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Fig. 4. Virtual models in colors 


Fig. 5. Virtual environment with textures and presented in two viewpoints 


Fig. 6. Virtual environment presented in one viewpoint 


5 Results 


After analyzing the answers, in the graphic issue the experts preferred the models 
with textures and two viewpoints. The two viewpoints improved the orientation 
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in the virtual environment; however, the textures were not considered correlated 
to reality. 

In the haptic issue, the adaptation of the device was considered a positive 
point because the pen of the device differs from the syringe, which compromises 
realism. The experts highlighted the importance of the force feedback, and the 
maximum force provided by the device (3.3 Newtons) was considered enough 
to simulate the contacts with the anatomical structures, including bones, which 
must offer the higher stiffness during the contact with the needle. 

The capture and reproduction of the movements between the device and the 
syringe in the virtual environment were considered suitable. The workspace of 
the device (170 millimeters, 120 millimeters and 70 millimeters in the axes x, y 
and z, respectively) was not considered enough, specially in the z axis. 

A constraint identified was the direction of the force feedback because the 
device provides three degrees of freedom for force feedback (axes x, y and z) and 
the syringe must be inserted diagonally, making the force feedback impossible 
in this position. 

The experts did not want to use the navigation in the virtual environment, 
opting for mere manipulation. Finally, it is worth noting that in the evaluation 
of the update rate, the value of 40 Hertz did not affect the human-computer 
interaction, generating acceptable delays. 


6 Conclusion 


Considering the results, hypotheses 1, 2, 4 and 5 were confirmed, because the 
experts who participated in the experiment preferred the models with textures 
(hypothesis 1), however, the textures should be improved; the experts preferred 
two viewpoints to determine the position of the syringe and to orientate the ac- 
tions inside the virtual environment (hypothesis 2); the workspace of the device 
was not enough, which was only observed in movements in the z axes (hypoth- 
esis 4), being resolved for reducing the distance between the syringe and the 
anatomical structures; and the experts preferred the device adapted with the 
real syringe (hypothesis 5). 

According to the experts, the adaptation of the device provided a way to 
hold the instrument as in the real world. Figure 2] shows how people usually 
manipulate the device with and without the syringe. 

Hypothesis 3 was rejected because the maximum force provided by the device 
was considered enough; yet, as the device does not offer six degrees of freedom of 
force feedback, considering the position of the models and the orientation of the 
syringe, the direction of the feedback cannot be simulated. One possible solution 
would be acquire a device that offers six degrees of freedom of force feedback. 

Regarding to update rate, the haptic update rate is recommended to be about 
1.000 Hertz, with some different cases, for instance, 300 Hertz and 550 to 600 
Hertz [i]; nevertheless, the value of 40 Hertz was considered acceptable. 

In summary, the degree of realism was considered satisfactory, showing that, 
despite constraints were found, the system may currently contribute to training 
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of an anesthesia procedure task. Future works include an analysis of the low 
update rate and the perception of the experts; force feedback values according 
to properties of each anatomical model, including sets of models; tests with the 
participation of more experts and more realistic textures for the models. 
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Abstract. Stand-alone and networked surgical virtual reality based sim- 
ulators have been proposed as means to train surgical skills with or with- 
out a supervisor nearby the student or trainee. However, surgical skills 
teaching in medicine schools and hospitals is changing, requiring the de- 
velopment of new tools to focus on: (i) importance of mentors role, (ii) 
teamwork skills and (iii) remote training support. For these reasons a 
surgical simulator should not only allow the training involving a student 
and an instructor that are located remotely, but also the collaborative 
training session involving a group of several students adopting different 
medical roles during the training session. 

Collaborative Networked Virtual Surgical Simulators (CNVSS) allow 
collaborative training of surgical procedures where remotely located users 
with different surgical roles can take part in a training session. Several 
works have addressed the issues related to the development of CNVSS 
using various strategies. To the best of our knowledge no one has focused 
on handling heterogeneity in collaborative surgical virtual environments. 
Handling heterogeneity in this type of collaborative sessions is important 
because not all remotely located users have homogeneous Internet con- 
nections, nor the same interaction devices and displays, nor the same 
computational resources, among other factors. Additionally, if hetero- 
geneity is not handled properly, it will have an adverse impact on the 
performance of each user during the collaborative session. In this pa- 
per we describe the development of an adaptive architecture with the 
purpose of implementing a context-aware model for collaborative virtual 
surgical simulation in order to handle the heterogeneity involved in the 
collaboration session. 
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1 Introduction 


Over the last few years how a surgeon is trained has changed considerably. 
Nowadays, new tools allowing the surgeon train and validate your skills, before 
going a real operating room with a real patient are in common use. Moreover, the 
role of the mentor during surgical training is of great importance, since he guides 
the learner not only considering technical aspects of the surgical procedure but 
also supporting the professional vocation of the surgeon. Nonetheless, issues such 
as the low number of expert surgeons located in distant regions or with enough 
time to provide face-to-face training in surgical centers have made difficult to 
apply the learning model mentor-apprentice in this context. 

Networked virtual surgical simulators (NVSS) have been proposed to over- 
come these issues. NVSS has been created to allow a student to be trained 
remotely by an instructor. In such a system, the instructor can perform the pro- 
cedure remotely while the student not only watches, but also feels (haptic feed- 
back provided) what the instructor is touching without actually participating in 
the execution of the surgical procedure. Nevertheless, a surgical procedure usu- 
ally requires several medicine specialists playing a specific role and collaborating 
each other, with the purpose of saving the patient’s life. Additionally, hand eye 
coordination skills are as important as communication and collaboration skills 
during a procedure. Considering this goal, CNVSS have been proposed to al- 
low for the collaborative training of users located remotely with each member 
playing a role during the training session. 

However, differences in users’ machine capabilities and network conditions, 
called heterogeneity factors, may affect the level of collaboration achieved by 
the users in a CNVSS, which directly affects the purpose of surgical training as 
a team. As far as we know no one has proposed a strategy in order to handle the 
heterogeneity in CNVSS and thus mitigate the impact that these factors have 
over collaboration. 

In this paper we describe the development of an adaptive architecture, ex- 
tending the SOFA framework developed in [I], with the purpose of implement- 
ing a context-aware model for collaborative virtual surgical simulation in order 
to handle the heterogeneity involved in the collaboration session. The proposed 
architecture allows the modification in real time of simulation variables such as: 
(i) mesh resolution, (ii) collision, visual rendering and deformation algorithms, 
(iii) local and remote computation in order to adapt the CNVSS to the context 
of the system (i.e. user preferences and roles, network conditions and machine 
capabilities) and optimize the collaboration of the users. 

The paper is structured as follows: Section 2 describes similar projects and 
opportunities of research. Section 3 describes an adaptive architecture proposed 
to mitigate heterogeneity issues raised in CNVSS. Section 4 shows the results 
and section 5 conclusions and future work. 
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2 Related Works 


Collaborative Networked Virtual Surgical Simulators (CNVSS) allow collabora- 
tive training of surgical procedures where remotely located users with different 
surgical roles can take part in a training session. Several works have addressed the 
issues related to the development of CNVSS using various strategies. In [2] and 
[3], two specific middleware systems for CNVSS are proposed. These middleware 
systems use specific protocols, different architectures, compensation mechanism, 
among other strategies to address network impairments such as jitter, delay 
and packet loss, in order to maintain adequate collaboration and shared state 
consistency of the virtual environment. 

In [4], [5] and [6] a middleware is proposed to handle network connection 
issues while maintaining the consistency of the collaborative surgical virtual en- 
vironment. The proposed middleware is composed by: network management ap- 
proaches (including services management), collaboration mechanisms, adaptive 
protocols, various deformation models, 3D to 2D synchronization and flexible 
computation policies. They try to manage the heterogeneity of the network con- 
nection and machine capabilities of the user, implementing two computation 
policies to handle the deformation: (i) local computation policy, where defor- 
mation computation is off-loaded to individual participants; and (ii) the global 
computation policy, where a participants hardware with powerful computational 
capability is assigned as the server in the system. The bandwidth requirement 
for the former is low since the data transfer only involves the parameters of the 
computation. 

A high-performance, network-aware, collaborative learning environment is de- 
veloped in [7]. A middleware system that monitors and reports network condi- 
tions to network-aware applications that can self-scale and self-optimize based 
on network weather reports is described. The core system and applications have 
been developed within the context of a clinical anatomy testbed. A review about 
CNVSS is presented in [8]. Challenges characterizing these CVE (Collaborative 
Virtual Environments) are identified, and a detailed explanation of the tech- 
niques used to address these issues are provided. Finally, some collaborative sur- 
gical environments developed for different medical applications are described. In 
this review, strategies to handle heterogeneity in CNVSS are not reported. 

Yet, to the best of our knowledge, no one has proposed an arquitecture and 
an adaptive model to handle heterogeneity in CNVSS. This paper describes the 
development and implementation of an adaptive architecture that provides the 
sotware structure and functionality required to apply an adaptive mathematical 
model. 


3 Description of CNVSS Network Architecture 


The network architecture describes the functional relationship existing between 
network elements that compose a CNVSS. Client-server and peer-to-peer are the 
most commonly implemented network architectures, each one providing differ- 
ent advantages and disadvantages for the development of CNVSS. Peer-to-peer 
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architecture computes the simulation on each client machine providing low re- 
sponse time but making difficult to guarantee shared state consistency of the 
simulation for each user. By contrast, client-server architecture centralize the 
computation of the simulation in a machine called server and transmits the re- 
sult to each one of the clients, so shared state consistency is guaranteed but 
increasing response time of actions performed by each user. Additionally, the 
server can become a bottleneck for the computing load and data that needs to 
be communicated through server [9]. 

To avoid some disadvantages of client-server and peer-to-peer architectures, 
a hybrid client-server architecture is implemented in our CNVSS, based on the 
one proposed by [10]. This architecture allows to maintain the consistency of 
the collaborative virtual surgical environment, centralizing the computing of the 
surgical simulation on a server, and also preventing the server from becoming a 
bottleneck by distributing the computation load of collision, visual and haptic 
rendering algorithms among each client. 

In hybrid client-server architecture the server role consists of computing the 
deformation of anatomical structures and the client role consists of running, lo- 
cally, the visual rendering, collision detection and haptic rendering algorithms. 
Whenever a collision between an anatomical structure and a client surgical in- 
strument (local user input) arises, the client sends the primitives of the anatom- 
ical structures which are colliding with the surgical instrument, as well as the 
instrument position and orientation data to the server. Then, considering the 
collision primitives of all clients, the server calculates the deformation of the 
anatomical structures and sends the simulation data, mostly composed by the 
deformed state of organs and tissues, to each client. This process is involved 
in all of the operations supported by our CNVSS such as probing, attaching, 
carving, cutting and attaching of clips. 

Figure [I] shows an example scheme of the architecture where two users are 
collaborating. 


Client/Server Machine 
Client Machine 


Local User Collision Local User 
a Input Detection [Dai mm Input 
@ Collisionand 
Simulation Network Instrument Data _ Network 
Computation Data Simulation and e ‘Data 
Instrument Data 


Visual and 
~ Local User aaa Local User 
Output P' ‘d Output 
Rendering 


Fig. 1. Network elements and the funtional relationships that compose the architecture 
implemented by our CNVSS. Right one plays server role and left one plays client role. 
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4 Description of Adaptive System 


An adaptive architecture allows modifying its parameters or structure using 
different mechanisms, in order to maximize the objective for which it was devel- 
oped [II]. Considering CNVSS its main objective is to guarantee collaboration 
between users, performing a surgical training session as a team, despite different 
conditions that system could face. This collaboration must only be affected by 
the level of expertise of users performing surgical training, and not by system 
conditions such as machine capabilities or network conditions. 

The process of adapting an application is composed by two important con- 
cepts: context and adaptive mechanisms [12]. The first refers to everything that 
surrounds the application and can affect their status or behavior. The second is 
the mechanisms used by the application to adapt to the current context. For ex- 
ample, considering CNVSS context refers to network conditions (i.e. bandwidth, 
jitter, latency and packet loss rate), computing capabilities of user machines and 
preferences, and role of users. On the other side, adaptive mechanisms are the 
size of data transmitted between user computers (i.e. resolution of anatomical 
structures simulated in the training session), visualization quality of anatomical 
structures, deformation algorithms, among others. 

An adaptive architecture consists of three components: (i) The monitoring 
component, (ii) the inference or adaptation machine, and (iii) the reconfigu- 
ration system component (Figure 2). Each one of these components and their 
implementation in SOFA framework will be described in the next. 


User Role User Preferences 


Machine Capabilities 
User Profile 


Network 
Conditions 


Monitoring 
Component 


Configuration 
Component 


Fig. 2. Components taking part of the adaptive architecture proposed 


Inference 
Machine 
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4.1 Monitoring Component 


This component gathers all the information related with the context of the sys- 
tem. Table [1|summarizes variables that are part of the context of the CNVSS 
developed. 


Table 1. CNVSS Context Variables 


Jitter and packet loss Iperf application was implemented (Milliseconds / 
Percentage) : 
Ping service was used, which returns the round-trip 
time of a data packet transmitted between two nodes 
in the network (Milliseconds). 
Pathload application was implemented (Mbits/s) 


(14. 


Frames per second This value is calculated by the framework SOFA for 
each of the machines. 

User role and preferences|The role options in the CNVSS are attaching 
anatomical structures, handling the camera, cut- 
ting, applying clips and cauterizing. Considering 
user preferences, the user chooses between display 
and interactions preferences. The scale range is from 
0 to 1, where 0 was a low preference and 1 was a high 
preference. 


Furthermore, this component defines the time interval for updating the con- 
text. 


4.2 Inference or Adaptation Machine Component 


This component infers which mechanism is required to adapt the simulation 
considering the context determined by the monitoring component. So far, the 
inference component is based on an expert system, consisting of a set of rules, 
which compare variables of the context with predefined thresholds, and deter- 
mines the best options available for each of adaptation mechanisms. 


4.3 Reconfiguration System Component 


This subcomponent applies the mechanisms and actions required by the system 
and determined by the inference machine. Our system basically has three dif- 
ferent mechanisms that vary the parameters of the simulation, and allow the 
system to adapt to the conditions of context maximizing collaboration among 
users: 
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— Changing resolutions and algorithms: This mechanism allows to interactively 
change the resolution and the algorithms used to simulate the anatomical 
structures, in each of the user machines taking part of the collaborative train- 
ing session. Using this strategy is possible to vary the computing capacity 
required by each machine to run the simulation. 

— Quality of data transmission: Using this mechanism the amount of data being 
transmitted per unit time over the network, from the server role machine to 
the client role machines, can be changed. This change in the quality of data 
transmitted is performed using two methods: (i) the mapping method, which 
allows decreasing the resolution of the mesh transmitted from server to client, 
and (ii) varying the transmission frequency of the data. 

— Local and remote computation: By default the system performs the computa- 
tion of the deformation in a centralized manner and the result is transmitted 
to clients which update their simulation state. However, when it is required 
this mechanism allows clients to change between the centralized manner to 
calculate the deformation locally and then update it with the computation 
performed by the server. In this way, response time of the simulator is de- 
creased for deformations performed locally and mitigates the possible effects 
of latency on collaboration of the users. 


5 Implementation of the Network Architecture and 
Adaptive System 


The SOFA framework allows the development and integration of new compo- 
nents in order to extend the basic functionality it provides. This framework 
provides all the elements to develop a surgical scenario such as: collision, vi- 
sual and haptic rendering, deformation and topological changes algorithms, and 
data structures for loading and storing the geometry of the anatomical struc- 
tures that will be simulated. However, the framework lacks of capabilities to 
develop CNVSS and adaptive components. In this section we describe each of 
the components developed for this purpose: 


— CNVSS-MW is a middleware layer which provides three main networking 
capabilities to our CNVSS; (i) organize the event data in messages that can 
be transmitted using an specific application level communication protocol 
developed, (ii) defines, depending on the type of message, whether it needs 
to be sent using UDP (User Datagram Protocol) or TCP (Transmission 
Control Protocol), (iii) controls the connection state and manages the session 
between clients and server. 

— MultilevelThetrahedralHybridForceField allows to switch between two algo- 
rithms with different complexity to computes the deformation of the anatom- 
ical structures. 

— MultiResolutionMeshLoader and MultilevelMesh: These components are re- 
sponsible for loading and storing data structures at different resolutions for 
each of the anatomical structures simulated in the surgical scenario. 


284 C. Diaz et al. 


— MultilevelFixedConstraint is a component that allows applying fixed me- 
chanical constraints in multiresolution meshes. 

— MultilevelBarycentricMapping applies the barycentric mapping method in 
data estructures with different resolutions. 

— AttachingController, CarvingController and AttachingClipController receive 
all the collision and basic surgical operation data sent by each client and 
apply them modifying the simulation state at the server side. Additionally, 
determine whether a local surgical instrument is colliding with an anatomical 
structure and if it is, all the information related with the col- lision and the 
basic surgical operation performed (attaching, carving, clip attaching, among 
others) are stored to be send to server by CNVSS-MW component. 

— OmniDriver and RemoteOmniDriver functionality is described in [I5]. 

— NetworkController is a very important component in our architecture that 
runs at the client and server side as an independent thread. Its function is to 
read the state of the components described above and determines whether 
there is an event to be transmitted by CNVSS-MW to the clients or to the 
server. 


6 Results 


A preliminary test was performed in order to evaluate the CNVSS developed, 
and determines if it is possible to maintain collaboration of users conducting a 
training session as a team while context is changed to deteriorate conditions. Ten 
users grouped as team of two members performed the experiment. The surgical 
procedure trained was cholecystectomy and users were expert in the surgical task 
trained (Figure). Each team performed the procedure twice: (i) using the CN- 
VSS described in this work and (ii) using a non-adaptive version of CNVSS. The 
order of the experiments was determined randomly. Network conditions and ma- 
chine capabilities were varied at the beginning of the experiments so that they 
were not suitable to guarantee good collaboration. Task completion time and 
number of errors were measured. 


Table [2] and [3]shows the results obtained in the experiment described. 


Table 2. Preliminary results for the task completion time (seconds) 


[Experiment| Adaptive CNVSS|NonAdaptive CNVSS 
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Fig. 3. Two users performing a Cholecystectomy using the CNVSS developed. Each 
instrument is controlled by one user. 


Table 3. Preliminary results for the number of errors 


|Experiment| Adaptive CNVSS|NonAdaptive CNVSS 


4 10 
11 
15 


2 
6 
4 9 
3 20 


From the results obtained it can be observed that when users performed the 
experiment using the adaptive CNVSS had a shorter task completion time and 
fewest errors. Considering users were expert in the execution of the task, it can 
be determined that metrics are affected only by system conditions (i.e. machine 
capabilities or network conditions). 


7 Conclusions and Future Work 


The conceptual development of a CNVSS able to adapt to the current con- 
text, and maintain the collaboration of users when the machine capabilities and 
network conditions are not the best is described. Three important components 
are part of the system: the monitoring component, the inference machine and 
configuration component. Additionally, CNVSS implementation using the SOFA 
framework is described, and its integration with hybrid client-server architecture 
is presented. Finally, an experiment to evaluate the developed CNVSS is per- 
formed. From the results obtained it can be concluded that the adaptive CNVSS 
allows users collaboration to be more effective, and task completion time and 
number of errors are decreased. 

As future work the development of an inference machine based on robust math- 
ematical models is proposed including the dynamic adaptation of the CNVSS. 
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Abstract. Virtual reality serious game platforms have been developed to en- 
hance the effectiveness of rehabilitation protocols for those with motor skill 
disorders. Such systems increase the user’s motivation to perform the recom- 
mended in-home therapy exercises, but typically don’t incorporate an objective 
method for assessing the user’s outcome metrics. We expand on the commonly 
used human modeling method, Fitt’s law, used to predict the amount of time 
needed to complete a task, and apply it as an assessment method for virtual en- 
vironments. During game-play, we compare the user’s movement time to the 
predicted value as a means for assessing the individual’s kinematic perfor- 
mance. Taking into consideration the structure of virtual gaming environments, 
we expand the nominal Fitt’s model to one that makes accurate time predictions 
for three-dimensional movements. Results show that the three-dimensional re- 
finement made to the Fitt’s model makes better predictions when interacting 
with virtual gaming platforms than its two-dimensional counterpart. 


Keywords: Fitt’s law, virtual reality games, physical therapy and rehabilitation, 
linear modeling. 


1 Introduction 


Gaming platforms for serious games play an important role in the rehabilitation field 
[1]. Such systems have been developed to increase the motivation of users to perform 
their in-home recommended exercises [2], [3]. Moreover, previous research has 
shown these systems can be used to calculate kinematic metrics associated with an 
individual’s movement profile. In [4], a prototype rehabilitation game was presented 
that used the Kinect system to analyze biomechanical movements of the upper 
extremities represented as range of motion and posture data. In [5], an augmented 
reality system that enabled 3D-reaching movements within the environment was pre- 
sented. They derived a set of kinematic data represented as movement time and end- 
effector curvature values. Finally, [6] evaluated the probability of recognizing six 
different movement gestures, useful for rehabilitation, when using a virtual gaming 
system. Although virtual systems such as these show the viability of collecting kine- 
matic movement data, they do not provide a quantifiable means of determining the 
quality of that movement. As such, we focus on incorporating a methodology within 


R. Shumaker and S. Lackey (Eds.): VAMR 2014, Part II, LNCS 8526, pp. 287-297] 2014. 
© Springer International Publishing Switzerland 2014 


288 S. Garcia-Vergara and A.M. Howard 


existing virtual reality (VR) gaming platforms that objectively evaluates the outcome 
metrics of an individual during game play. 

A common symptom experienced by individuals who have a motor skill disorder is 
slow movements [7]. As such, movement time (MT) — defined as the time needed to 
complete a given task — is a kinematic parameter of interest in rehabilitation interven- 
tions because it directly correlates with the speed of the individual’s movements. In 
this paper, we focus on predicting the MT needed to complete a task in any VR gam- 
ing platform. We use the prediction as the ground truth value for quantitatively com- 
paring the user’s nominal MT as a means for assessing their kinematic performance. 
Because of its wide adoption, we make use of the model of human movement, Fitt’s 
law [8]. This law predicts the amount of time a user needs to reach a given target in a 
virtual environment. Even though refinements to improve the accuracy of Fitt’s law 
have been made to the original model, to the best of our knowledge, there has not 
been any directly derived for time prediction for three-dimensional (3D) movements; 
which are inevitable when interacting with a VR system or a serious gaming platform. 

As such, we propose a new variation on the Fitt’s law model that takes into consid- 
eration 3D spatial movements. Section 2 presents a short literary review on previous 
variations and modifications made to the original model. Section 3 discusses in detail 
the procedure taken to create our final model. Section 4 presents the results obtained 
in testing sessions with human participants. Finally, we analyze the results in Section 
5, and make our concluding remarks in Section 6. 


2 Background 


Fitt’s law was initially designed to predict the amount of time a user needs to com- 
plete a task in order to design better human-computer interaction (HCI) interfaces or 
to determine the best input method for a digital system. Card et al. [9] used Fitt’s law 
to evaluate four devices with respect to how rapidly they can be used to select text on 
a CRT display. Walker et al. [10] compared selection times between walking menus 
and pull-down menus. Gillian et al. [11] used Fitt’s law to examine the needed time to 
select a text using a movement sequence of pointing and dragging. 

Even in these applications for Fitt’s law, HCI researchers have developed several 
refinements to improve the accuracy of the model. MacKenzie [12] summarized some 
refinements that deal with the definition of the difficulty of a task. In the original 
model, the difficulty of a movement task (DI for “difficulty index”), was quantified 


by (1). 
DI = log, (2 * A/W) (1) 


where A is the distance to move, and W is the width of the target to reach. Welford 
[13] proposed a new formulation for DI (2) after noting a consistent departure of data 
points above the regression line for ‘easy’ tasks (i.e. DI < 3 bits). 


DI = log,(A/W + 0.5) (2) 
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Moreover, a preferred formulation (3), known as the Shannon formulation [14], is 
commonly used because it provides a better fit with observations, mimics the infor- 
mation theorem underlying Fitt’s law, and provides a positive rating for the DI. 


DI = log,(A/W + 1) (3) 


To the best of our knowledge, none of the previous studies have used Fitt’s law for 
human movement assessment purposes, and the supporting literature for the theory 
behind Fitt’s law is limited to two-dimensional (2D) movements. In this paper we 
discuss a methodology for building a model that: 1) predicts movement time for three- 
dimensional movements, and 2) is used as a tool for quantitatively assessing an indi- 
vidual’s kinematic performance. 


3 Methodology 


3.1 Serious Game Platforms 


In this paper we focus on expanding the functionality of serious game platforms used 
for rehabilitation by incorporating an objective kinematic assessment methodology. 
We make use of the developed platform called Super Pop VR™ [15], [16]. It com- 
bines interactive game play for evoking user movement with an objective and quanti- 
fiable kinematic algorithm to analyze the user’s upper-arm movements in real-time. 
While engaged with the game, users are asked to move their arms to ‘pop’ virtual 
bubbles of various sizes, which appear at various locations in the virtual environment. 
As the bubbles appear on screen, a 3D depth camera maps the user’s movements into 
the virtual environment. These movements map into movement tasks that require 
reaching a target from a specified initial position; which are evaluated by Fitt’s law. 
Figure | shows a comparison between a reaching task evaluated by Fitt’s law (Figure 
la), and an example of a reaching exercise in the Super Pop VR™ platform (Figure 
1b). The ability to reach is critical for most, if not all, activities of daily living such as 
feeding, grooming, and dressing [17]. Failure to recover upper-extremity function can 
lead to depression [18]. As such, reaching movements, correlated to reaching exercis- 
es, are of interest in various rehabilitation scenarios. 


w+ 


(a) (b) 


Fig. 1. Comparison between a common movement task evaluated by Fitt’s law (where A is the 
distance traveled, and W is the width of the target) (a), and a reaching exercise in the Super Pop 
VR™ platform (b). Figure (a) adapted from [12]. 
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Applying Fitt’s law to the Super Pop game, we focus on predicting the amount of 
time a user needs to move between two displayed ‘bubbles’ as a function of the dis- 
tance between the virtual objects and the width (diameter) of the target ‘bubble’. Giv- 
en the nature of the described platform, users move their arms in the 3D space in or- 
der to interact with the virtual objects on the screen. As such, we first need to build an 
appropriate model (i.e. define a DI), that is able to make accurate time predictions for 
3D movements. 


3.2. Linear Models 


Fitt’s law predicts movement time as a linear function of the difficulty index (DI) of a 
task (4). Because of its wide adoption and popularity, we adhere to the DI definition 
of the Shannon formulation as seen in (3), resulting in a model of the form (5). 


MT =a+b*«DI (4) 
MT =a+b*log,(A/W + 1) (5) 


where MT is the predicted movement time (in milliseconds), A is the distance to 
move, W is the width of the target to reach, and a and b are the intercept and the slope 
of the model respectively. Building a Fitt’s model refers to training the slope and 
intercept to fit MT data collected from users interacting with the system. In general, a 
number of movement tasks are defined by selecting different combinations of traveled 
distances and widths of targets, and then by calculating the corresponding DI. Human 
MT data are collected for each defined task, and a linear regression between the MT 
averages per task and their corresponding DIs is performed to compute the slope and 
intercept of the model. 

Since we are interested in building a model that is appropriate for 3D movements, 
the distance travelled is now the 3D Euclidean distance between the initial position of 
the user’s hand, and the target. However, a complication arises because the positions 
of the ‘bubbles’ in the virtual platforms are defined in a 2D space. This means that the 
movement tasks are selected based on 2D data. As such, we built two linear models. 
The first model correlates the 2D pixel distance between the virtual objects to the 
user’s 3D path length (PL). We then use this model to calculate the distance travelled 
parameter in (5) and create our second model: the correlation between the DI of a 
movement task and the time needed to complete it. 

For collecting human MT data needed to develop the model, we recruited seven 
able-bodied adults to interact with the Super Pop VR™ game. Sixteen tasks were em- 
pirically selected; each participant was assigned to repeatedly complete eight of them. 
We collected, on average, 24 + 5 PL and MT points for each task. To increase the 
correlation factor between variables for both models, we assume that both datasets 
follow a Gaussian distribution, and thus only considered data points that were within 
one standard deviation of the mean of the complete dataset. Moreover, taking into 
consideration the learning curve of the platform, the 
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Path Length Average vs 2D Pixel Distance 


Path Length [mm] 
° 


0.0 50.0 100.0 150.0 200.0 250.0 300.0 
2D Distance [pixels] 


Fig. 2. 3D path length averages of collected human data versus 2D pixel distance between 
virtual objects. Figure also shows the final linear correlation (continuous line) between the two 
variables. 


Movement Time Averages vs Difficulty Index 
1400.0 


1200.0 + 


600.0 


Movement Time [ms] 


0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 
Difficulty Index [bits] 
Fig. 3. Movement time averages collected from human data versus the corresponding DIs for 


each task. Figure also shows the final linear correlation (continuous line) between the two 
variables. 


participants were required to interact with the system twice before starting the actual 
collection of the data. This practice eliminates the possible errors due to unfamiliarity 
with the game. 

A linear regression was performed on the collected path length data to correlate the 
participants’ 3D PL to the selected 3D pixel distance between ‘bubbles’ (Figure 2); 
which yielded (6) with a correlation factor of R’=0.9703. MacKenzie [12] argues that 
correlations above 0.900 are considered to be very high for any experiment involving 
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measurements on human subjects. Thus, we can conclude that the PL model provides 
a good description of the observed behavior. 


PL = 3.5651 * D, — 174.3 (6) 


where D, is the 2D distance between the two virtual targets, and PL is the 3D path 
length travelled by the user for the corresponding D, in mm. 

A second linear regression was performed on the collected MT data to correlate the 
participants’ MT to the DI of the corresponding tasks (Figure 3); which yielded (7) 
with a correlation of R’=0.7428. Although the resulting correlation factor is not con- 
sidered to be ‘very high’, it still suggests that the MT model also provides a good 
description of the observed behavior. The DI of the tasks was calculating using (5), 
making use of the built PL model (6) to substitute for the travelled distance. 


MT = 208.97 * DI + 435.02 (7) 


where DI is the difficulty index of a given task, and MT is the movement time predic- 
tion made for the task (in milliseconds). It’s important to mention that (7) is limited to 
the selected definition of DI. If a different definition were to be used, the MT model 
would have to be retrained. 

Combining equations (5), (6), and (7), we obtain the final MT model (8) as a linear 
function of the 2D pixel distance between two virtual objects by making use of a 
second linear model of human PL data. 


MT = 208.97 «log, eae 


a 1) + 435.02 (8) 


where D, is the 2D pixel distance between two virtual objects of the given task, W is 
the width of the second virtual object, and MT is the movement time prediction made 
for the given task (in milliseconds). Since the argument of the logarithm has to be unit 
less and since the PL model computes values in millimeters, the width of the target 
has to also be in millimeters. 

In order to better determine the accuracy of the 3D Fitt’s model, we also created a 
common 2D Fitt’s model and compared the prediction results to the nominal MT 
values collected from the participants. The 2D model was built in a similar fashion 
than the 3D model. The selected tasks were the same as those for the 3D model. The 
difference relies on the fact that the DIs for the tasks were computed using the 2D 
pixel distance directly, instead of applying the PL model. A linear regression was 
applied to the collected MT data to correlate the participants’ MTs to the DIs of the 
corresponding tasks; which yielded (9) with a correlation factor of R’=0.7346. 


MT = 245.2 * DI + 377.42 (9) 


4 Experimental Results 


The final model was tested with seventeen able-bodied high school students. Seven 
females and ten males ranging in age between 15 and 16 years (mean age = 15.5 
years, standard deviation = 0.5 years) were recruited to interact with the Super Pop 
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VR™ game in order to validate that the proposed methodology for creating a Fitt’s law 
model is appropriate for 3D movements. The participants interacted in an office set- 
ting, which was maintained constant in order to maintain consistency. The virtual 
reality game screen was projected onto a large screen via a projector connected to a 
PC laptop. The chair height upon which the participants sat was 41cm tall, the dis- 
tance between the user’s chair and the depth camera was 190cm, and the distance 
between the projector and the screen was 170cm. Each participant was asked to play 
four games (two per arm), and PL and MT was collected for a total of six trials per 
arm. 

Taking into consideration the learning curve of the used platform, we evaluate the 
last trial of the participants’ dominant hand. Table 1 summarizes a comparison be- 
tween the participants’ nominal MT for the selected trial and the movement time pre- 
dictions made by the 2D and 3D models. The error of the prediction is defined as the 
absolute difference between the participant’s MT and the prediction made. The table 
also shows which model made the best prediction for each case; the model that best 
fits the given scenario is the model with the smallest difference between the actual 
MT and prediction. Figure 4 expands on Table | by organizing the results in a graphi- 
cal medium. 

Table 2 shows the progression of MT values over the six trials of Participant 2’s 
dominant hand. Similar to Table 1, Table 2 shows a comparison between Participant 
2’s nominal MT and the predictions made by the 2D and 3D models. The table also 
includes the decision of the model that makes the most accurate prediction based on 
the absolute difference between the actual MT value and the predictions made. 

Table 3 shows a summary of how the models behave for clear 2D and 3D move- 
ments. The data collected from the last trial of Participant 7’s dominant hand are con- 
sidered as 3D movements, while the data collected from the last trial of Participant 
16’s dominant hand are considered as 2D movements. The table shows the MT pre- 
dictions made by both models on the two described scenarios, and the participants’ 
actual MT for both scenarios. 


5 Analysis 


It’s important to keep in mind that the linear models were built with data collected 
from adults. This allows for the possibility of over (or under) predicting path length 
(PL) and movement time (MT) values given that they were tested with data collected 
from high-school teenagers. Previous research has been shown that kinematic capabil- 
ities, among other parameters, are a nonlinear function of the age of the individual 
[19]. As such, there are some scenarios where neither the 2D nor 3D models make 
accurate MT predictions. For example, Participant 5 moved in almost double the time 
than what both models predicted (Table 1). Moreover, we only collected 24 + 5 data 
points per task, while studies similar to [20] collected 470 trials per task. More data 
would results in a higher correlation and, thus, more stable models. 

Another important observation is that our 3D Fitt’s model falls into the known 
two-dimensional model when the movements are (almost) planar. Table 1 shows that 
both models make very similar predictions in these scenarios, suggesting that there is 
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no deterioration when applying the 3D model to 2D movements. More importantly, in 
scenarios where the individual makes 3D movements, the 3D model makes more 
accurate predictions than the 2D model. Table 3 shows an example of such scenarios. 
The table shows that the predictions made by both models for a case where the 
movements were in the 2D space (participant 16), are relatively similar to each other. 
The difference between the predictions is 5.22 ms, the difference between the predic- 
tion of the 3D model and the actual MT is 208.28 ms, and the difference between the 
prediction of the 2D model and the actual MT is 213.51 ms. 

Similarly, Table 3 shows that the prediction made by the 3D model is more accu- 
rate than the prediction made by the 2D model for a case where the movements where 
in the 3D space. The difference between the two predictions is 411.41 ms (which is 
considerably of greater value than that of the 2D movements), the difference between 
the prediction made by the 3D model and the actual MT is 35.96 ms, and the differ- 
ence between the prediction made by the 2D model and the actual MT is 375.45 ms. 
These results show that, for 3D movements, the proposed Fitt’s model with a PL 
model included makes more accurate MT predictions than the original Fitt’s model. 


Table 1. Comparison between the participants‘ MT nominal values and the predictions made 
by the 2D and 3D models to determine the best predictor for each scenario 


mn TU | 2DDI '|2DMT_p 'Difference| 3DDI '! 3D MT_p | Difference| Best 
Participants é ' \ 7 H \ z. 
[ms] [bits] :| [ms] : [ms] [bits] : [ms] : [ms] Predictor 
1 1129.69 | 1.44 731.67 | 398.02 1.33 | 713.42 | 416.27 2 
2 901.69 | 1.46 | 735.63 : 166.05 1.54 | 757.77 : 143.92 3 
3 1180.66 | 1.01 | 624.60 } 556.07 2.73 ' 1005.02 | 175.64 3 
4 980.43 | 1.27 : 689.89 : 290.54 1.48 | 743.60 : 236.83 3 
5 1413.22 | 1.68 | 789.09 | 624.13 1.48 ' 745.13 ' 668.09 2 
6 1283.64 | 154 : 754.32 } 529.32 1.60 ' 770.16 ' 513.48 3 
7 958.87 | 0.84 | 583.42 } 375.45 2.68 | 994.83 : 35.96 3 
8 1071.39 | 1.46 | 734.25 } 337.14 1.99 ' 850.30 ! 221.09 3 
9 978.12 | 1.17 | 665.26 ; 312.85 1.10 | 664.87 : 313.25 2 
10 913.74 1.80 : 819.61 } 94.13 1.64 | 777.94 ' 135.80 2 
11 1167.51 | 1.50 : 745.69 } 421.83 1.34 | 715.61 '! 451.91 2 
12 943.35 1.50 : 745.43 | 197.92 1.81 ' 812.25 ' 131.10 3 
13 926.22 1.86 | 834.28 : 91.93 1.76 ' 802.81 ' 123.41 2 
14 949.05 1.31: 699.43 | 249.62 2.40 : 935.79 ' 13.26 3 
15 1352.22 | 1.59 : 767.88 | 584.34 1.56 ' 760.11 ! 592.11 2 
16 934.13 | 1.40 | 720.62 } 213.51 1.39 | 725.84 ' 208.28 3 
17 1046.67 | 1.55 | 756.61 | 290.05 1.46 ' 739.80 ' 306.86 2 


Table 2. Progression of MT values over the six trials of participant 2’s dominant hand 


2DDI | 2DMT_p : Difference | 3DDI | 3DMT_p | Difference | Difference Best 
[bits] [ms] : [ms] [bits]: [ms] ' [ms] Predictor 
716.67 | 339.13 1.59 +: 768.14 ' 287.66 51.47 3 


697.57 655.26 1.32. : 711.19 | 641.64 13.62 


[ms] 
1 1055.81 1.38 
2 1352.83 1.31 
2 1010.50 1.76 + 808.77 201.73 1.89 ' 830.98 | 179.52 22.21 
4 972.78 1.51 | 748.32 224.45 150 ' 748.91 ! 223.87 0.59 
5 
6 


898.18 1.43: 727.28 ; 170.89 1.87 +: 82494 + 73.23 97.66 
901.69 146 | 735.63 ; 166.05 1.54 757.77 + 143.92 22.14 


wwww w 
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Table 3. Predictions made by both models when applied to clear 2D and 3D movements 


2D Movement 3D Movement 
(Participant 16) (Participant 7) 


Prediction from 2D Model [ms] 720.62 583.42 
Prediction from 3D Model [ms] 725.84 994.83 
Participant's Actual MT [ms] 934.13 958.87 
Difference between predictions [ms] 5.22 411.41 


Difference between 2D prediction and 


213.51 375.45 
actual MT [ms] 
Difference between 3D prediction and 
208.28 35.96 
actual MT [ms] 
1600.00 
mNomind MT Values 
MT Predictions by 2D Model 
1400.00 
MMT Predictions by 3D Model 
1200.00 
a 
£ 1000.00 
ov 
£ 
© 800.00 
i 
a 
E 
o 
3 600.00 
= 
400.00 
200.00 
0.00 
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Participants 


Fig. 4. Comparison between the participants’ nominal MT and the predictions made by the 2D 
and 3D models 


6 Conclusion and Future Work 


The proposed methodology for developing a 3D Fitt’s law model has the potential to 
be incorporated into existing serious game platforms as an effective means of reha- 
bilitation. Results show that the final 3D model can better predict human MT for dif- 
ferent reaching tasks when compared to its 2D counterpart. For future consideration, 
in order to have a fully robust methodology for 3D time prediction model, data from 
different age demographics must also be collected and added to the model. 
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Abstract. Augmented Reality (AR) applications may be used to enhance under- 
standing of physical objects by addition of digital information to captured video 
streams. We propose new bio-secure system for interactions with bacterium 
biofilm images using the AR technology to improve safety in experimental lab. 
In proposed application we used state-of-the-art real-time features detection and 
matching methods. Also, various methods of feature detection and matching 
were compared with each other for real-time interaction and accuracy. The im- 
plementation of an app on a tablet device (Apple iPad) makes it useable by mul- 
ti users in parallel. 


Keywords: Multi-user, Real-time, biofilm, Augmented reality. 


1 Introduction 


Bacteria can reproduce simply and rapidly by doubling their contents and splitting in 
two. A colony of bacteria that sticks to a surface forms a biofilm. Furthermore, infor- 
mation such as the biofilm diffusion coefficient, bacterium dimension and trajectory 
are among quantities that scientists are interested in to understand and possibly ex- 
plain the effect of new drugs on single species of bacteria. Computer vision and imag- 
ing techniques could be utilised to support better understanding of those mechanisms 
by helping to localize, track and measure bacteria features. Also, use of interactive 
visualisation techniques could enhance users’ understanding; for instance, the user 
could explore naturally complex interior structures and morphology of bacteria during 
the course of biofilm formation. User interactions with visualization systems may be 
carried out using either a touch-based interface such as a keyboard and mouse, or a 
touchless interface such as gesture recognition cameras. 

In the bio-imaging space, the user has the ability to pause a biofilm evolution 
movie and call up data annotations extracted from the database by selecting a 
bacterium. Based on an earlier study [1], users are more willing to use touch- 
based interfaces compared to a touchless ones. In most situations only one person 
interacts with the system. Additionally, users could use mobile handheld devices 
to capture biofilm fragments and call up augmented information on the top of it. 
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In visualisation setup, any number of users could interact with the system at a 
time, without interfering with other users or even collaborating with them. For 
instance, tapping on a bacterium in the biofilm evolution movie watched through 
a tablet camera, could display related information on the tablet display held in the 
user’s hand. The same augmented data displayed on a dynamic moving bacterium, 
must also be available to the user in following frames until the user selects anoth- 
er bacterium. 

Bacterium morphological properties may vary from frame to frame. Therefore, de- 
termining a specific bacterium in the biofilm on tap commands, from an underlying 
moving image taken by a tablet camera in real time, is the major challenge of this 
research. The initial prototype will assume that tablet devices are aware of the frame 
number currently displayed on a large screen or hemispherical dome. Image cross- 
correlation techniques will allow detection of the biofilm sub image that will be fur- 
ther used to find a corresponding bacterium automatically. 

Displaying a 3D object on the surface of a marker (a point of reference) and esti- 
mating camera position to stabilise the object is not a new concept [18]. Many algo- 
rithms have implemented and are available in various SDKs [16-18]. The API func- 
tion allows displaying the virtual information over a predefined visual marker. Inte- 
raction using AR without a predefined marker is classified as marker-less AR, where 
any part of the real environment may be used as a target to be tracked in order to 
place a virtual object on. Marker-less augmented reality relies heavily on natural fea- 
ture detection in images received through the camera. As soon as a known physical 
object is detected, the appropriate virtual object may be displayed over it. The detec- 
tion of a known objects require that the features in an unknown image watched 
through a camera are matched with feature from a known object. Features are parts of 
an image that can be used for image matching or object detection and can usually be 
classified into three categories: regions, special segments and interest points [4]. In- 
terests points such as corners are usually faster to detect in images and more suitable 
for real-time applications. Scale-invariant feature transform [5], Speeded Up Robust 
Features [6] and Harris [7] corner detector methods have been used widely in the 
literature to detect features but heavy mathematical computation involved in any of 
these methods may slow down an application significantly. SCARF [8] and ORB [9] 
are the recent attempts to improve the speed of feature detection. 

Feature descriptors are used to describe image structure in the neighbourhood of a 
feature point. Using the descriptors, a feature points in an image can be matched with 
features in other images. SIFT, SURF and ORB are among feature descriptor methods 
that are rotation and scale invariant. Other feature descriptors such as BRIEF [10] and 
Daisy [11] are designed to be fast by sacrificing the rotation and scale-invariant prop- 
erties. Similar feature points (i.e. points with similar feature descriptors) in source and 
destination images may represent the same point on single object in separate views. 
Matching features using brute force search (search among all features in the destina- 
tion image)[12] is very time consuming and has little use in real-time applications. 
FLANN [15] is a library for performing fast approximate nearest neighbour searches 
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in high dimensional spaces. The library includes a collection of algorithms for perform- 
ing the nearest neighbour hood search and a system for automatically picking the best 
algorithm based on the data characteristic. Based on a survey [13] the Vantage-Point 
tree is a good method for estimating the nearest neighbour for matching feature descrip- 
tors. The vantage-point tree has the best overall construction and search performance 
and is used in this work. 


2 System Implementation 


The system for interacting with biofilm images through an AR application is confi- 
gured as shown in Figure 1. 


Image Processing 
Tools 


Sending the bacterium position 
receiving morphology information 


t 


Projector 


User Hand Held Device 


Capturing the 
Frame Number} » 


Displaying the Bacteria over 
Screen through a projector 


Fig. 1. Overall System Configuration 
A bacterium tracking method [2] is used to extract morphological properties of 


each bacterium in every frame. The information is stored in a database, which can be 
accessed individually based on the bacterium position in the biofilm. 
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The biofilm evolution movie, which is displayed over a wall surface by a projector, 
is viewed through the camera of a handheld device (tablet). Interaction with images 
displayed on the wall is by an augmented reality application. 

The following tasks are to be performed in order to add virtual information arc- 
hived for every bacterium in the database through an AR application: 


_ 


. A server displays the biofilm movie on the large screen and continuously updates a 
variable used to keep track of the frame number. 

2. Users watch the movie through the camera of handheld device (tablet). The video 
filmed by the camera is displayed on the handheld device. 

. Tapping on a single bacterium in the live video watched on a handheld device trig- 
gers the information retrieval process. 

4. The frame number is fetched from the server. 

5. The bacterium position in the biofilm image that is matched with frame number is 
calculated. 

. Bacterium position is used to extract the required information from database. 

. Bacterium is highlighted and information is displayed back (augmented) on the 
handheld device. 

. Until the next tapping, the previously detected bacteria position will be used to up- 
date the location of the bacterium virtual information in subsequent image. This is 
more explained in section 2.5. 


WwW 


ND 


oo 


The major concern as discussed before is the ability to locate the position of a 
tapped bacterium in the biofilm sub image on the handheld device. The methods used 
and the reasons for selecting them will be described in the following sections. 


2.1 Feature Detector 


Detection of the features and matching descriptors around a tapped bacterium is used 
to retrieve bacterium position in the original biofilm image. FAST is one of the fastest 
corner detection methods suitable for real-time applications [3]. It is based on the 
pixel intensity comparison around a circular neighbourhood of a query point. Each 
pixel in the circle is labeled from 1 to 16 clockwise. If a set of N contiguous pixels in 
the circle is all brighter than the intensity of candidate pixel p plus a threshold value t 
or all darker than the intensity of candidate pixel p minus threshold value t, then p is 
classified as a corner. 


2.2 Feature Descriptor 


We used BRIEF as the feature descriptor. The formulation of the BRIEF descriptor as 
follows: 


1. Select a series of N, (X, Y) where X=(x:.y,), Y=(x,y,) are location pairs around 
the feature point. 


1 xX Y 
2. T(X,Y,P)= ie eae for every selected (X,Y) (1) 
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Here P represents a patch around a feature point, T represents a test on patch P 
and p(X) is pixel intensity at pixel X. Performing the above operation on N pairs 
creates a binary string as a feature descriptor. Selecting the size of the patch around 
the feature point and selection of N pairs can affect the accuracy of the method. 
A study [10] shows that selection of the pairs from a double Gaussian distribution or 
selecting randomly around the feature points can produce better results. The advan- 
tage of the binary string as a descriptor is the ability to calculate the distance between 
every pair (e.g. Hamming distance) very quickly on many processors [10]. Our expe- 
riments show that the major bottleneck of marker-less AR is the feature matching part 
(section 2.3). So selecting a not very sophisticated descriptor that is fast to calculate is 
the only way to achieve near real-time AR. This is discussed more in section 3. 


2.3. Feature Matching 


We use Vantage-Point as the feature matching method. The matching method itself 
and a technique to improve the search speed are discussed in the following section. 


Vantage-Point (VP) Search Tree Construction. The idea of constructing a binary 
search tree is to divide the search space recursively based on a similarity measure- 
ment in order to increase search speed, which is possible by pruning nodes that 
cannot be better than the best answer already found. Rather than partitioning points 
on the basis of relative distance from multiple centres (as is the case with k-means), 
VP-tree splits points using the absolute distance from a single centre. Tree construc- 
tion begins by assigning all points to the root node, and then recursively partitioning 
the points into one of several children of the node. This process continues until 
some termination criteria are met. Two common criteria are the maximum leaf size 
(the leaf contains fewer than a given number of points) and the maximum leaf 
boundary [12]. 

The algorithm for constructing the Vantage-point tree with hamming distance as a 
measure of similarity between two bit strings is summarized in Algorithm Vantage- 
PointTree. 


VantagePointTree (lower, upper) 
If Termination condition is met, return 

Create a node 

Select a random bit string in the search space as the 
vantage point and place it in the node 

Sort the other bit string in ascending order based on 
their distance to the Vantage-Point 

Select the median bit strings 

Keep the distance between the vantage-Point and the me- 
dian as the vantage-point boundary in the tree node 
Node leftchild =VanatagePointTree(lower+1,median) 

Node right child=VanatagePointTree (median, upper) 
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In this algorithm, lower and upper are the indices of an array used to store the 
binary strings. The search algorithm is shown in Algorithm VantagePoinTree- 
Search. 


VantagePoinTreeSearch (target, node, o) 


If node=Null return 
dist= distance(node , target) 
If dist <o 
Keep tree root 
keep o 
if leftchild(node) =empty and rightChild(node)=empty re- 
turn 
if dist < node threshold 
if dist-o <= node threshold 
VantagePoinTreeSearch(taget,node left child, o) 
if dist+o >= node threshold 
VantagePoinTreeSearch(taget,nde right child, o) 
else 
if dist+o >= node threshold 
VantagePoinTreeSearch(taget,tree right child, o) 
if dist-o <= node threshold 
VantagePoinTreeSearch(taget,node left child, o) 


In this algorithm o is the smallest distance that has been found so far. The target 
binary string is a descriptor of a feature in the source image. 

The tree construction is performed for every frame in the original biofilm evolution 
movie beforehand, so that the tree construction phase does not have any effect on 
application processing speed. 


Increasing Search Speed Using Triangle Inequality. The search algorithm may not 
need to process all the points in a leaf if the distance between every point in a leaf and 
the leaf node is calculated during tree construction. Let {b,,b,...b,} be the points in 
the leaf, B the leaf node and d(b,,B) > d(b),B)>...>d(b,,B) where d is the Hamming 
distance. Based on triangle inequality we have 


d(target,B)<d(target,b;)+d(B,b;) for i=1,2,...,n (2) 
d(target,B)-d(B,b;)<d(target,b;) for i=1,2,...,n (3) 
So d(target,B)-d(B,b;) is the lower bound for the d(target,b;). If at any stage of 


searching a leaf point we find jé{1...n} where d(target, B)-d(B,b;) > o, the algo- 
rithm will stop searching the other points, as the distance between the target and 
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remaining leaf points will be higher that o since d(b;,B) i€{1,....n} are sorted in 
descending order. 


2.4 Matching and Outliers Removal 


In Fig. 2, pairs of feature matching of the source image (images on left are viewed 
through handheld device camera) and destination image (biofilm images stored in a 
database) are displayed. As the images show although there are some correct matches 
there are also many mismatches that must be removed before further processing. Es- 
timating homography using RANSAC [14] can be used to remove the outliers. The 
result after removing the outliers is shown in Fig. 3. 


Fig. 3. Removing outliers 


Multi-users Real-Time Interaction with Bacterial Biofilm Images 305 


2.5 Bacteria Position Retrieval and Displaying Information 


Homography matrix is used to translate the tapped position in held device coordi- 
nates to image coordinates. A search inside the database is carried out to find the 
closest bacterium. The information for this bacterium will then be displayed on 
handheld device. The inverse of the homography matrix and the bacterium position 
in image coordinates is also used to track the position of the last tapped bacterium in 
subsequent frames before any new tapping. This can be used to display the virtual 
information at the right position even if the handheld device moves in a different 
direction (Fig. 4). 


Fig. 4. Displaying the information in right position in different device orientation 


3 Experimental Results 


The application frame rates when implemented using different combination of 
feature detector and descriptor methods is calculated. The application runs for 
30 seconds and frame rate was recorded prior to feature matching. As Fig. 5 shows, 
the combination of FAST feature detector and BRIEF feature descriptor method 
(Fig. 5 c) is the best choice for a real-time application. It is necessary to mention 
that this result is valid for high-density biofilm image sets and may not be valid for 
other image sets. 

The accuracy of application was also evaluated and compared with other imple- 
mentation of the application using different feature detection and matching methods. 
The application accuracy is estimated by measuring the acceptable range of device 
rotation. The acceptable range is the maximum rotation in every direction before the 
application loses the bacterium position between two consecutive taps (refer to sec- 
tion 2.5). This is carried out by comparing the positions extracted from inverse homo- 
graphy of different matching methods with results from SURF matching inverse 
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homography method in different device orientations. The reason for selecting SURF 
as the base model is because of its rotation and scale invariant properties. The results 
are shown in Fig. 6. These images are produced when the device rotated around the 
vertical axis. Fig. 6 shows that FAST/BRIEF feature matching acceptable device rota- 
tion range is limited to [-5.05, 25.80] (Fig. 6 b) which is shorter that other rotation and 
scale invariant feature detector and descriptor. This means that the user can only use 
the application in situation where there are no significant changes in handheld vertical 


device orientation. 
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Fig. 5. Frame rate achieve during 30 seconds experiments using different method 
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Fig. 6. Difference between estimated positions using inverse homography of various methods 
and SURF feature and descriptor matching 


4 Conclusions 


Lower processing power of handheld devices in comparison with desktop computers 
raise the necessity of developing a low-computational approach for real-time applica- 
tion. Employing a feature descriptor method, which is not scaled and rotation inva- 
riant was an approach used in this paper. The application lets the user experience a 
real-time AR but limited device acceptable rotation, drop usability of the application. 
The whole experiments reveal that a real-time and a rotation and scale invariant fea- 
ture detector and descriptor in high-dense environment are still an ongoing research. 
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Abstract. The population is aging fast and with aging come cognitive impair- 
ments that often require costly facility care. This paper proposes Smart Glasses 
that can help alleviate these impairments at their early stages and thus allow se- 
nior citizens stay away from facility care longer. The Smart Glasses produce 
exogenous cues to attract user attention. Four usability experiments are de- 
scribed to evaluate the utility of the cues and other usability factors of the pro- 
posed system. We expect the results will give us valuable information on how 
to improve the design of the system based on senior citizens’ needs. 


Keywords: smart glasses, aging in-place, assistive technology, attention con- 
trol, cognitive impairment. 


1 Introduction 


Ferri et al. estimated that 24.3 million people suffered from dementia in the year 2005 
and 4.6 million people are added to this number every year. It is predicted that num- 
ber of people suffering from dementia will be about 81.1 in the year 2040. Unobserv- 
able cases of dementia should also be added to the estimation. [4] 

Abovementioned statistics and estimations have led numerous researchers develop- 
ing tools and systems to support health care of senior citizen in their home. This con- 
cept is known as aging-in-place. Supporting senior citizens' independent daily life and 
monitoring health, safety, physical and cognitive functionalities are the main purposes 
to develop new tools and systems. [2] 

Common problems for the senior citizens are memory related issues, and range 
from simple age-related problems to Alzheimer’s Disease. A collaborative study in 
Nordic countries [5] was conducted on individuals with dementia and the goal was to 
find out what kinds of aid devices are used for assistance, how suitable they were for 
the users, and to gather improvement feedback for the aid device researchers [8]. 
Conclusions indicated that introducing aid devices for the caretakers and people 
suffering from dementia has improved management of daily activities; it helped care- 
takers and patients to maintain skills and made people socially more active. Prior 
researches have also suggested that navigation technology has the potential to provide 
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important support for the elderly by similarly motivating and empowering them to 
perform their daily activities. [7] 

The ability to achieve and maintain focus of cognitive activity on a given task is a 
fundamental component of the cognitive capacities of humans. Researches on visual 
capabilities of the elderly have concluded that aging itself brings along decline in both 
cognitive abilities and the capabilities of the visual system, added with constraints 
brought by dementia. [9], [12] 

Research on attentional capacity of the elderly [1] suggests that both normal aging 
and Alzheimer’s Disease (AD) impair people's performance in reaction tests but con- 
tinue to conclude that people in the earlier phases of AD were not significantly more 
impaired by the increase in difficulty of a given task than the normal elderly. AD 
patients may have more problems in filtering interference from similar background 
material. The paper concludes there was no apparent decline in the capacity to divide 
attention with age, whereas there was a clear impairment in the dual-task performance 
of AD patients. 

Visual performance of humans depends on both operational variables and physical 
variables. The operational variables include age, visual capabilities (contrast and light 
sensitivity, color and depth perception) and the characteristics of the task. The physi- 
cal variables consist of lighting conditions, disability or discomfort glare, and colors 
in the vicinity, among others. In addition, several cognitive processes affect how in- 
formation is filtered for processing through the general physical features. Attention 
has been described as limited by the mental effort available, and the limited cognitive 
capacity of attention can be actively spread over several cognitive demands at a time. 
How much attentional capacity and finite processing resources are allocated and 
needed for each task is determined by a combination of factors. [9] 

There is also evidence that endogenous and exogenous cues have separate and ad- 
ditive effects. Endogenous cues allow the participant direct their attention to the indi- 
cated location at will, which also implies the symbology of the cues must be unders- 
tood and their meaning remembered throughout the task. Exogenous cues, such as a 
flash of light, attract attention automatically. Such a cue is still effective even if the 
participant’s cognitive resources are occupied elsewhere. [13] 

We have founded our approach for the Smart Glasses on the premises set by the re- 
ferenced literature. Section 2 describes the system setup, section 3 explains the test 
setup and the usability tests we have planned, and Section 4 concludes the paper. 


2 Smart Glasses System 


The first version of Smart Glasses prototype contains 12 red LEDs and 12 green LEDs 
as presented in Figure 1, and the second prototype version contains 6 red LEDs and 
6 green LEDs as presented in Figure 2. The LEDs are positioned on the frames of 
Smart Glasses and are controlled by TLC5940 drivers. The drivers are connected to a 
micro-controller (low-power ATMegal68V) via serial communication bus. The com- 
mands for different LED patterns are received through the wireless communication 
module. A Li-ion battery supplies power for the micro-controller. The micro-controller 
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is connected to a Bluetooth Serial Port Profile (SPP) module. SPP module is the com- 
munication gateway of the micro-controller and an Android application. SPP is used to 
send 32-bit control messages from the remote controlling device (Android tablet) to the 
Smart Glasses. Remote controlling device translates 32-bit control messages to voice 
commands, and sends them to an audio device via Bluetooth. 


Fig. 1. First prototype version of smart glasses having 12 green LEDs and 12 red LEDs posi- 
tioned on the frames 


Fig. 2. Second prototype version of smart glasses having 6 green LEDs and 6 red LEDs posi- 
tioned on the frames 


3 Usability Test for Smart Glasses 


The main objective of conducting usability experiment is to remove blocking and 
problematic issues from user's path through the application. Problematic issues mostly 
cause failure in achieving maximum desired application's usability. Analyzing tasks 
of usability test facilitates designing user interface and application concept more accu- 
rately. There should be four to six participants in usability testing to rely on results; a 
final report should outline findings and provide developers with recommendations to 
redesign the system. [3], [10] 

Usability experiment setting is defined as specific number of participants, a mod- 
erator and a set of tasks to test the system. It identifies problems, which have been 
hidden through the development process from developer's point of view. In order to 
organize usability testing before conducting it, a set of assumptions should be prede- 
fined, and then assumption should be evaluated after the usability testing. [6], [11] 

In order to measure usability in experiment, it is necessary to define following 
factors: 


e Effectiveness means user's ability to accomplish tasks. 
e Efficacy means user's ability to accomplish tasks quickly without difficulty and 
frustration. 
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e Satisfaction means how much user is enjoying doing tasks. 

e Error frequency and severity means how often user makes errors and how serious 
are the errors. 

e Learnability means how much user could learn to use the application after doing 
the first task. 

e Memorability means how much user could remember from one task during next 
tasks. 


Separate tasks could be designed to evaluate different usability factors. [6], [11] 


3.1 Test Setting 


Subjects in all the experiments will be senior citizens suffering from dementia, and 
people suffering from other illnesses like color-blindness, tinnitus or Parkinson's Dis- 
ease potentially affecting their performance in the tests will be excluded. The mini- 
mum number of participants in each experiment will be four. An observer 
representing the medical center will be present in all the experiments. 

In order to evaluate satisfaction properly, the observer will be advised to encourage 
participants to think aloud during the experiment and give feedback to observer at the 
end of each experiment. Qualitative questionnaires will be presented after each expe- 
riment to collect participants’ satisfaction and preferences. 

One video camera will be used to record participants’ actions and another video 
camera will be used to record their eye movements during the tests. The recordings 
will be synchronized and time-stamped, which will help to investigate the sequence of 
events properly. 

Different kinds of test applications on an Android tablet will be used to record the 
results and log other necessary information on the experiments. These tools accompa- 
nied with the qualitative questionnaires will help us to investigate effectiveness, effi- 
cacy, satisfaction and learnability of the system. 


3.2 Test Scenarios 


We have defined four usability tests to evaluate usability factors of Smart Glasses 
system. The foremost purpose will be to establish the feasibility of the designed Smart 
Glasses for the indoor and outdoor navigation scenarios. The second objective will be 
to measure usability factors that can have either strengthening or weakening effect on 
the design. Salient factors are effectiveness, efficacy, satisfaction and learnability of 
the system. 

The first test will focus on finding the best way the system can attract participant's 
attention. In the second test we will be asking the participants’ opinion of the best 
pattern for indicating all possible directions. The third test will tell us how well the 
navigation instructions given by the Smart Glasses can be followed by the participant 
by moving their finger on a tablet PC to the direction indicated. Finally, the fourth set 
of tests will be first conducted in open space indoors where the participant is walking 
through a predefined route with the help of the Smart Glasses. This test will also be 
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repeated in open space outdoors to capture how the changes in ambient light and 
sounds will affect the usability of the Smart Glasses. 

The first test is designed to identify how accurately senior citizens can recognize 
precisely which LED on Smart Glasses is lit or blinking. At the same time, this test 
aims to identify how accurately senior citizens can recognize the general direction in 
which the LED on Smart Glasses is lit or blinking. The directions are defined as light- 
ing up a single LED or a combination of LEDs depending on the Smart Glasses proto- 
type version. A test application for the tablet PC will be developed to store partici- 
pants' responses. Participants will be divided into two groups, one having the proto- 
type version with six LEDs per lens and the other having three LEDs per lens. A 
number of sequences for lighting up the LEDs will be defined beforehand and the 
sequences are used in tests randomly in order to avoid any learning effect from one 
test to another. By comparing the results obtained from tests with different LED con- 
figurations we hope to be able to define the specific number and configuration of 
LEDs per lens that yields the best results. After identifying the most suitable pattern 
of LEDs per lens on the Smart Glasses, two further experiments will be conducted. 

In the second test, we will present the participants with all feasible LED combina- 
tions for a given direction. We will then ask their opinion on which particular pattern 
they would associate the best with the specific direction in question. 

The third test will incorporate a Bluetooth headset to accompany the Smart 
Glasses. In addition, a tablet PC with a stylus and two cameras will be utilized. An 
application running on the tablet PC will communicate with the Smart Glasses and 
headset via Bluetooth. The application user interface is designed as a grid layout with 
invisible lines and it will include a specific number of cells. The operator selects a 
route from predefined set of routes to follow. A route is a set of adjacent cells (Figure 
3), having a starting point and an endpoint. When participant moves the pen on the 
screen from a cell to another cell, the application recognizes if the pen is moving 
along the route or not. The application calculates the next movement direction based 
on the current position of the pen and its relation to the next cell in the route. After a 
specific time delay, a new direction indication is sent to Smart Glasses and headset. 
If participant makes an error and moves the pen to a cell outside the route, the appli- 
cation will provide a direction indication towards the nearest cell in the route. The 
application will guide the participant periodically to reduce the amount of errors dur- 
ing the usability testing. To evaluate intuitiveness and learnability of the system, this 
test will be conducted in three different variations. The guidance can be audio only, 
LEDs and audio or LEDs only. By evaluating the results, we will also be able to de- 
termine whether the modalities support or hinder each other with participants being 
cognitively impaired. 

The fourth and last test is a navigation experiment to guide the users through a spe- 
cific route in both an indoor and an outdoor environment. The routes will be prede- 
fined and contain a fixed number of turns to each direction and predetermined length. 
Participants will be randomly assigned a route from the set. These navigation tests 
will not only evaluate usability of the system under more realistic conditions, but also 
evaluate the influence of ambient light on visibility of the LEDs and the effect of 
ambient sounds from the environment to the audio-cues. 
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Fig. 3. A predefined route in the third test contains a sequence of cells (gridlines are not visible 
for the participants) 


4 Conclusion 


In this paper, we have described our Smart Glasses approach to assist senior citizens 
in their daily activities. Four usability tests have been defined to evaluate usability 
factors of the system. We will be conducting the tests in the next few months and 
report the results on HCII 2014. During the testing, we will iterate over the design for 
the Smart Glasses based on test results and participant feedback. 
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Abstract. The aim of this research effort is to identify feeling-of-presence and 
metacognitive amplifiers over existing well-established VRET treatment 
methods. Patient real time projection in virtual environments during stimuli ex- 
posure and electroencephalography (EEG) report sharing are among the tech- 
niques, which have been used to achieve the desired result. Initialized from 
theoretical inferences, is moving towards a proof-of-concept prototype, which 
has been developed as a realization of the proposed method. The evaluation of 
the prototype made possible with an expert team of 28 therapists testing the fear 
of public speaking and fear of flying case studies. 


Keywords: Virtual Reality Exposure Therapy, Anxiety Disorders, Sense of 
Presence, Metacognition, Fear of Public Speech, Fear of Flying. 


1 Introduction 


Virtual Reality Exposure Therapy (VRET) is a technique that uses Virtual Reality 
technology in behavioral therapy for anxiety disorders treatment. Having many people 
suffering from disorders, as such as social phobia, etc., VRET therapies that rely on 
Computer Based Treatment (CBT) principles for a diagnosis and evaluation estab- 
lishment of the patient's progress, constitute a promising method. VR interfaces ena- 
ble the development of real world models to interact with. In other than phobia thera- 
py application areas, like cultural and scientific visualization, education and infotain- 
ment, this aims at altering the model in such ways that the user can navigate in the 
artificially created environment in an immersive manner. Using VR environments, 
people can immerse themselves in models ranging from microscopic to universal 
scale, e.g. from molecules to planets. In phobias treatment there is an antistrophe to 
this rule and the concept is to change the behavior of the user after exposure to visual 
and auditory stimuli in a simulated experience. 


R. Shumaker and S. Lackey (Eds.): VAMR 2014, Part II, LNCS 8526, pp. 316-828] 2014. 
© Springer International Publishing Switzerland 2014 


Sense of Presence and Metacognition Enhancement in Virtual Reality Exposure Therapy 317 


1.1. Past Projects and Short History of VRET 


VR in service of cognitive-behavior therapies (CBT) has offered a lot over the past 
decades projecting several advantages including the generation of stimuli on multiple 
senses, active participation and applicability to most frequent phobias. Today, it is 
considered very effective from a psychotherapeutic standpoint, especially in carefully 
selected patients [23]. For example, Social Anxiety Disorder, the most common an- 
xiety disorder [28], can be treated using VRET systems [17] [13] [4]. There is a great 
variety of VRET systems related to a specific phobias, like fear of flying [2] [19], 
cockroach phobia [3] and dog-phobia [9], to name a few. More information can be 
found on the extensive list (300 studies) of the meta-analysis of Parson & Rizzo [23]. 


1.2. Facts about Phobias 


Over 2.2% of the adult populations of European citizens suffer from Social Phobias 
[31]. Although anxiety disorders can be treated in most cases, only one third of the 
sufferers receives treatment and even the specific phobia is not the primary reason to 
seek treatment [14] [5]. Actually, only the 26% of mental disorder sufferers have 
made a contact with formal health services [1]. Similarly, the US National Institute of 
Mental Health (NIMH) indicates that 6.8% of the US adult population suffer from 12- 
month prevalence Social Phobia, while the 29.9% of those (e.g. 2.0% of adult popula- 
tion) suffer from lifetime prevalence Social Phobia [16]. The rates for teenagers (13 to 
18 years old) include 5.5% of the population, with a lifetime prevalence of severe 
disorder affecting 1.3% of the population [21]. On the other hand, in Greece, the pre- 
valence of all Phobias is 2.79% (2.33 M, 3.26 F) [25]. 


1.3 Structure of the Paper 


This paper is organized as follows: After the introduction, Section 2 (Requirements of 
a new approach) identifies main areas of VRET adaptation on exposure therapies. The 
therapeutic aims and the functional requirements of the new approach are presented in 
section 3 (A more flexible approach). The use cases of the pilot studies and the con- 
tent development are discussed on Section 4 (Performance situations and content 
development). The evaluation section (Section 5) presents the results of the prototype 
evaluation by a group of experts. Finally, an overview of the novel approach as well 
as future plans are discussed in the last section (Section 6: Conclusions). 


2 Requirements of a New Approach 


The lack of widely accepted standards for the use of VR to treat specific phobias 
forces research and clinical use in vertical solutions in most cases. What if a new 
approach could load new content on demand and be programmable by the therapist to 
adapt to specific cases and parameters of each patient? 
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In order to design a VRET system to help therapists achieve a permanent change in 
patient's behavior, contemporary efforts should take into account current technologi- 
cal trends, updated psychological research results and certain limitations. For exam- 
ple, haptics are not required in social phobias, and/or fear of internal states (e.g. fear 
of vomit) against stimuli is difficult to be replicated in VR. 

After a thorough research on existing solutions, we identified three main areas of 
adaptation: A) adaptation to the requirements of the therapists, including special con- 
ditions of the clinical use and the trends of exposure therapy (e.g. portability, reusabil- 
ity, reliability, effectiveness) and B) adaptation to the specific phobia or anxiety dis- 
order as a matter of content and functional automation (virtual world, scenarios, ava- 
tars, stimuli) and C) adaptation to the needs of individuals (phobia history, level of 
anxiety, human factors). The following sections discuss certain aspects of adaptation. 


2.1. Adaptability in Performance Situations 


Social anxiety disorder refers to a wide range of social situations, so adaptability of a 
VRET system can be extremely difficult. Instead of creating and using a highly adap- 
tive VRET system with moderate or poor quality of immersion and presence, a tar- 
geted solution would be more appropriate, especially in performance situations. 


2.2 + Self-awareness 


As Hood and Antony note, phobia sufferers ‘exhibit biased information processing 
related to specific threads, while their attention and interpretation are biased’ [14]. 
The mechanism behind that, as well as the result itself stays invisible to the sufferer 
even if most individuals understand that they overreact. The difficult point seems to 
be around error estimation, because patients are not able to see themselves and the 
outcome of their overreaction during stimuli. 


2.3. Feeling of Presence 


According to Eichenberg [10], VR is experienced as realistic under the conditions of 
‘immersion’ (virtual world perceived as objective and stimulating) and ‘presence’ (the 
subjective experience of ‘being there’). The feeling of presence, or Sense of Presence 
(SoP), and the Immersion are logically separable, with the former considered as ‘a 
response to a system of a certain level of immersion’ [26]. It is believed that, in order 
a projected word model to be therapeutically useful, it requires a strong SoP [18] [6]. 


2.4 User Profiling and Monitoring 


Not all people respond in the same way given the same stimuli [20] and thus, some 
patients do not respond to typical cognitive-behavior therapy in VR. Regarding hu- 
man responses, Behavioral Activation System (BAS) activity is reflected to changes 
in heart rate, while electrodermal responses resound the behavioral inhibition system 
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(BIS) activity [11]. Having a reliable activation of BAS and BIS on a real world ex- 
posure with the fear-provoking stimulus [29], phobic individuals have a weak BAS 
activity in contrast to overactive BIS [15]. Similarly, it was found that VR exposure 
activates the BIS alone [30]. Thus, heart rate and EEG data could be collected by the 
VRET system to fulfil the patient’s profile and monitor the progress achieved in a 
systematic way. Served with a detailed, after-VRET-session reports could offer an 
objective variable quantification basis for discussion and trigger metacognition. 


2.5 Customization and Personalization 


It is not uncommon that therapists would like to change the VRET scenario according 
to their personal intuition about the problem and the needs of their patients. VRET is 
by no means a one-size-fits-all tool to treat all phobic populations in a uniform way, 
because such an assumption could cancel its fundamental psychotherapeutic prin- 
ciples. Therapists need full control over the stimuli, the duration of the exposure and 
the simulated world itself. Moreover, variations of the same virtual environment could 
serve in avoiding the memorization of the simulated world and the way stimuli are 
affecting patient’s responses (memory effect). Thus, adaptation tools should be made 
available to therapist’s rather than VRET developers. 


3 A More Flexible Approach 


The proposed approach is a set of extensions to be applied over the well-established 
VRET methods and practices to maximize benefits. Figure | presents in a flowchart 
the main components of the proposed VRET system and the way patient’s response 
regulation is achieved, as an evolution to the schema used by Moussaoui et al. [22]. 


VRET System 


Performance 


Depth camera sensor 


|__| Therapist 


Fig. 1. A basic schema of the proposed treatment rule 


Physiological Activity 


Mental State 


The sufferer performs in front of a depth camera which previously had taken a pic- 
ture of the room (background) as a reference of non-moving objects. Keeping the rule 
of not moving the depth camera during a session, the system can isolate the figure of 
the moving actor (patient) from the static background and transfer that figure to the 
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virtual world. At the same time, the patient can navigate in a small area around initial 
position. Full body movements are transferred in real time (~20fps) in the virtual world 
to let the body language be directly observed (usually being seen from the back). 
Therapists use the keyboard to control the VR, the quality and intensity of the sti- 
mulus, like a film director. In the fear of flying scenario for example, the therapist can 
create turbulence to trigger the patient’s catastrophic thoughts and the overreaction. 
The VRET alarm subsystem is flashing when somatic sensors exceed predefined thre- 
sholds based on the patient’s profile. Those are used to monitor the flow of emotional 
responses during a session. Currently, there are sockets for heart rate sensors and 
Electroencephalography (EEG), transmitted wirelessly to PC (via Bluetooth). 


3.1. Therapeutic Aims and Functional Requirements 


The extension key-points of the therapeutic aims and the functional requirements of 
the prototype can be summarized as follows: A) To truly disconnect the VRET sup- 
portive system from the performed scenarios and the kind of phobia (highly struc- 
tured), B) Extensive reporting and monitoring of somatic symptoms via physical sen- 
sors (feedback), C) Enhance the feeling of presence and metacognition having in 
mind its importance on the treatment success, D) Be adaptable to the needs of specific 
scenarios to treat heterogeneous set of phobias in individuals (personalization). 


3.2 Feeling of Presence and Metacognition Amplifiers 


After a period of practical experimentation (Nov. 2012-May 2013), we finally 
achieved VRET-scenario disconnection, sensor data reporting and personalization 
using client profiles. Table 1 briefly presents the followed approach for each encoun- 
tered challenge, based on the factors influencing the SoP as Bouchard [7] adapted 
from Sadowski & Stanney [24] together with which, novel methods were used to 
achieve scenario-specific or mode-specific adaptation. 


Table 1. Factors and methods used in the prototype 


Factors Challenge Approach Limitations 
System Large field of | Head movement tracking when | LCD Screen sizes 
iew t ke th HMD is i Z 
related ae ped eae Delays in HMD fast 
factors sent Stereoscopic display in 3DTV | movements 
when Kinect is used ; 
Coane The virtual laptop 
onvincing level Po ot Beech 
of realist Build-in virtual laptop presen- | plugin is capable of 
tations loading ppt files only 
Ease of | Highly syn- | Self-video VR projection in | Narrow area naviga- 
anterace chronous _Inte- | LCD mode using Kinect tion when use Kinect 
: ti 
tion Bee Intuitive orientation and short | Self-projection in VR 
distance navigation not appropriate for 
body shape concerns 
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Table 1. (Continued) 
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Factors Challenge Approach Limitations 
User Direct user in- | The system responses to sen- | Lack of previously 
initiated itiated control sor’s input, based on zones of | captured physical and 
aehatal hides: by Whe accepted values ae input during the 
ecg : irst session 
therapist initiated | Interruptions allowed by the 
control therapist (having the highest 
priority) 
Objective | High quality of | Immersive prioritized stimuli | Known VR technolo- 
Realism stimuli (continuity, consistency, con- | gy limitations 
nectedness and meaningful- 
ness) 
Social Interaction with | Acknowledge the existence of | Limited artificial 
factors other avatars other passengers / audience intelligence 
Observation of | Restrained reaction of passen- 
other’s reactions | gers and crew during turbu- 
when exposed to | lence in flight scenarios 
ewan smal Crowd reactions as a result of 
the collective identity 
Duration Avoid unneces- | Time slots with quantized dura- | Lack of familiarization 
of sarily prolonged | tion depending on the per- 
, immersion formed scenario [or] 
immer- ae 
sion Familiarization Demo or introduction mode lige meh familiari- 
with the system which implements VR _ expo- zation with the system 
sure without the stimuli (easy 
flight or idle audience) 
Internal Individuals’ Create user profiles for ac- | Noisy user profiles 
factors characteristics cepted ranges of sensor (physi- | (low accuracy, narrow 
cal and EEG data) input based | testing periods, hu- 
on the first session man factors) 
Side Eliminate motion | Immobilized virtual camera for | If the fear is caused 
effects sickness, to | the public speaking scenario by the fear of dizzi- 
avoid dizziness Seve . . ness (not the turbu- 
P Eliminate motion sickness by : 
on returning Bene . lence), then the stimu- 
participants eliminatin, g camera rotations / 1; cannot be realisti- 
during flight scenario 
cally reproduced 


4 Performance Situations and Content Development 


Using VRET to treat phobias is a stepped procedure regarding elimination of the dis- 
tance between the desired sufferer’s response and the actual one. The concept is par- 
tially programed a priori by the therapist during the scenario preparation. This is 
made possible through a simple additional software tool, which generates scenario 
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files to be used later by the VRET. Scenario files follow a simple XML schema to 
describe elements and attributes of the VRET execution over specific virtual scenes. 


4.1. Scenario Preparation and Execution 


In Figure 2, the interface of the scenario preparation tool is demonstrated. The therap- 
ist can chain a series of short independent incidents to create a whole session. The 
therapist can modify the session duration and level of difficulty, while he/she can also 
intervene during the session’s execution and modify the computer-controlled avatars 
and certain parameters in real-time. The following two scenario-chains were initiated 
as working demonstration content, while in later phases they were used as case stu- 
dies in pilot tests. Both were carefully designed by experienced psychiatric staff with 
long route in Clinical Psychiatry (members of the General Hospital Psychiatric Socie- 
ty, Greece). The model development was based on the detailed scenarios provided by 
the psychiatric staff and was performed by experienced computer scientists/artists. 


Scenario Properties 
Audio [AMMOBUPTHLUPTHLPhab: | | {| mt) BF Browse 


'd [Fear_Of Public Speaking 


Desetiption Begin with quiet audience. Ate few 
look boed and around the fist mia 


Angry Audience Begin Conference 


‘Selected Video Properties 


Filename 


End Conference Begin with background noise 


Timeline Navigation 


Fx | Previn] ppnow | Prise | 


Timeline 


T2slenath 12s length 


Fig. 2. The VRET scenario maker, used by therapists to prepare automated scenario sessions 


4.2 The Fear of Public Speaking Scenario 


Figure 3 is a view of the virtual conference room used in the fear of public speech. A 
virtual laptop is available for running the client’s custom presentation (especially 
useful when HMD is in use) and to strengthen the feeling of presence by providing 
enhanced presentation-flow realism. The computer-controlled avatars behavior is 
defined by the scenario, but affected to some degree by their position (distance to the 
speaker). Virtual characters sitting in front of the speaker exhibit more detailed beha- 
vior and appearance. As one moves towards the far end of the conference room, there 
are three zones: A) 3D models with skeletal animation, bone facial expressions and 
lip synchronization, B) virtual persons who participate as 2D animations and C) in 
the far away, there were only static figures who can perform idle or imperceptible 
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horizontal movements. The intelligence of the avatars follows the axis of detailed 
visual representation having the front seated ones to be more smartly interactive than 
ones seating further back. Currently the idle, silent, normal, look bored, noisy and 
aggressive modes are available for the audience. 


4.3. The Fear of Flying (Turbulence) Scenario 


Figure 4 depicts what the patient is viewing from own perspective (in stereo mode). 
This scenario was created for people who fear flights and believe in catastrophic con- 
sequences of turbulence. During the flight, other passengers look and behave natural- 
ly, while the crew is offering beverages. The therapist can select whether to create 
discomfort at any time. In auto-mode, the intensity and quality of the stimuli can be 
raised or lowered by the artificial intelligence of the VRET system. 


tots 
~ 


Fig. 3. The conference room captured at a time audience express disapproval (aggressive mode) 


Fig. 4. The flight scenario viewed in stereo from patient’s perspective 
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4.4 Tools Used for Content Development, Rendering and Projection 


The content development of the supported case studies, mainly 3D objects and ava- 
tars, was made with SketchUp 8.0 and iClone 5.0. Scenes were imported into Unity 
Game Engine (version 4) in order to be projected through the HMD (Virtual Research 
v.8) in a very good realistic representation. The depth camera used in pilot studies 
was Microsoft Kinect and the OpenNI SDK was used for the 3D sensing middleware 
interface development. The patient’s body moves sensing functionality, engineered in 
Visual Studio, was exported as a dynamic link library (dll file) to the front-end appli- 
cation. The scenario development tool and the front-end application for stereoscopic 
projection on 3DTVs were developed with Delphi. 


5 The Evaluation 


Given the feature and functionality extension to existing VRET approaches, the eval- 
uation of the first working demonstrator aims at evaluating the proposed approach. 
The prototype was evaluated by a body of twenty eight professionals (N=28) from 
which 18 of which were women and 10 men. Their mean age was 47.73 (SD=13.16). 
Eleven of them were Psychiatrists (medical doctors) while the rest were clinical and 
counseling psychologists, including a 14.28% of students. The prototype used in the 
pilot study was a mature version which supported the two scenarios descried earlier, 
the depth camera and both the 3DTV and HMD versions in a dual graphics output. 
The output was rendered in full HD, in 16:9 aspect ratio. In the 3DTV version, it was 
viewed by a distance of 1.5 m (indicated by a colored area in the demo room floor). 


Table 2. Responses on the elements of the questionnaire (Likert scale: 1-5) 


# Question Mean | SD 
B1 | How familiar are you with Virtual Reality technology? 2,25 1.11 
B2 | How often have you used similar applications in the context of your | 1,57 0.92 


professional obligations, research or your studies? 


QI | Are the stated aims and goals of the modules obvious and intuitive? | 3,85 0.77 


Q2 | Is the workflow of loading and control of new modules intuitional | 3,13 1.45 
and without problems? 


Q3 | Did the modules you have tested so far covered your expectations on | 3,48 1.92 
Virtual Reality Exposure Therapies? 


Q4 | Did the overall system worked as expected? 3,6 1.10 


Q5 | Are you satisfied from the quality of content (graphics, realism)? 3,85 1.24 


Q6 | Using the VR tools, do you think your effectiveness as a therapist | 3,79 1.94 
will be increased? 
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After 5-10 minutes experimentation with the VRET system, participants were 
asked to fill up a questionnaire, the results of which are presented in Table 2. Res- 
ponders were familiar with the VRET, but the lack of personal experience gave a 2.25 
mean to the BI question (SD=1.11) and 1.57 in B2 (SD=0.92). From Q1 it was made 
clear that of the VRET prototype was perceived as obvious and intuitive. The means 
of 3.85 (SD=0.77) was the highest in the questionnaire. In Q2, participants found the 
workflow of loading and using modules to be intuitive and free of problems (M=3.13, 
SD=1.45). Also, in Q3, good expectations from the system was reported (M=3.48, 
SD=1.92). The prototype worked as expected (Q4, M=3.6, SD=1.10) and the level of 
satisfaction was very encouraging (M=3.85, SD=1.24). Based on their demonstration 
experience, testers believe that the proposed VRET system could increase their effec- 
tiveness as therapists (Q6, M=3.79, SD=1.94). 

The last open-questions aimed to capture missing functionality (Q7: In your opi- 
nion, what features or functionality are missing from the system or its modules?) and 
take feedback on the time and effort it would be necessary to learn how to use the 
system (Q8: Make a comment on the time & effort needed to learn the tools). Apart 
from the fact that the head tracking mechanism of the HMD was not available during 
the demonstration, most therapists did not find missing features. Two therapists men- 
tioned that using VRET systems cannot reveal much about the etiology of a specific 
phobia. However, some believe that knowing the reasons behind the onset of the pho- 
bia is not necessary to complete the treatment [14]. In Q8, most therapists implied that 
the VRET prototype was rather easy or very easy to learn (80%). A good learning 
curve and the low price -they said-would be necessary for a future investment. 


6 Conclusions 


VRET is used in phobias treatment as a tool to treat anxiety disorders which cause 
great impairment of patient's socialization, professional activity and quality of life. 
Given the long distance VRET has covered during the last decades, an extension to 
well-known approaches is proposed in order to enhance the Sense of Presence (SoP), 
disconnect content from VRET functionality by adopting the idea of scenario prepara- 
tion by therapists themselves and support multiple sensors to serve as objective meas- 
ures of anxiety levels. It is not a therapist-free solution like Virtually Free developed 
by Green, Flower and Fonseca [12] which uses mobile technology. 

A novel addition to the overall architecture is personalization (user profiles) and 
depth camera sensors which can project in real time the patient into the simulated 
world and leverage higher mental processes like social self-awareness and metacogni- 
tion to amplify VR benefits as a therapeutic modality. Realism was given attention, 
but not to the extreme that could raise the development cost, as a VRET system can 
be effective even at low representational level [27]. It is expected that therapists will 
use such VRET sessions before real life situation exposure. 

It is believed that the proposed approach is suitable for certain types of specific 
phobias, standardized over existing Diagnostic Classification systems like the 
Diagnostic & Statistical Manual of Mental Disorders of the American Psychiatric 
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Association [8]. Although the target audience of this study was the therapists, as users 
of the VRET system, a future clinical use with people who suffer from phobias would 
be necessary to confirm the usability of the prototype and the findings of the literature 
regarding the therapeutic use of the VRET. 


Acknowledgement. This work is supported by the EU funded project VERITAS 
(FP7 247765). 
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Abstract. Cognitive rehabilitation from a functional perspective often requires 
intensive training over a longer period of time. In the case of rehabilitation of 
unilateral neglect, the frequency and intensity needed is expensive and difficult 
to implement both for the therapists and the patients. For this reason, this case 
study tests the possibility of using computer-based training in the rehabilitation 
efforts for a patient with severe neglect who had no previous skills in computer 
usage. The article describes the results of the training both in terms of neurop- 
sychological tests and the reading ability of the patient. 


Keywords: optokinetic training, home training, computer-based training, unila- 
teral neglect, prism adaptation training, bottom-up. 


1 Introduction 


“All I want is to be able to read again”. These were the first words from the patient 
PK, when I met him in July 2013. PK had fallen down a flight of stairs in March the 
same year and had been committed to care and rehabilitation for almost 4 months 
prior to this meeting. Although his behavior expressed textbook neglect to a degree 
you rarely see 4 months after injury, he also demonstrated an impressive ability to 
maintain an artistic composition in memory and the will to fight his way back to life. 

The MRI and CT scans showed no apparent, recent injury. However, PK had a 
severe and maltreated renal condition and also a previous history of infarcts. From the 
MRI and CT scans it was imminently clear that PK had shown an extraordinary abili- 
ty to overcome the effects of the previous injuries, despite the apparent extent of 
physical damage. 

In this paper, I will try to illustrate how computer-based training was used in the 
patient’s home to accommodate the intensity needed to get effects from bottom-up 
cognitive training. I will go into details about injury, the assessment, the training and 
the results so far. I will outline the tools used for assessment as well as the computer- 
based training and also show how the reading ability of PK changed over time as the 
speed of visual perception improved. The paper will demonstrate how cognitive reha- 
bilitation of neglect may benefit from intensive home training using computer-based 
prism training, optokinetic training and scanning training but also how much is re- 
quired by the patient and the therapist. 
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2 Etiology 


PK is a 75 year old male, with an academic education as a geologist. PK is a re- 
nowned artist, painter, sculptor and essayist and has travelled extensively around the 
world completing the latest of 16 polar expeditions to the arctic areas of Greenland in 
2011. 

In 2000, PK suffers from a sudden, large intracerebral hemorrhage in the right pa- 
rietal lobe. A four centimeter hematoma is formed in deep tissue and an emergency 
evacuation had to be performed. Although subsequent CT scans reveals extensive 
damage to the right parietal and temporal lobe, PK recovers fully over time and is 
able to return to work after a brief period of recovery. 

In 2009, PK has another cardiovascular incident on a trip to Greece. Subsequent 
CT scans reveal ischemic changes in the left temporal-occipital lope. The neuropsy- 
chological test confirms that PK has lost color vision, the ability to recognize faces, 
has an upper right quadranopia, unilateral neglect and reduced reading ability. 

The hospital records indicate that PK demonstrates symptoms of neglect both after 
the first incident in 2000 and the second in 2009. PK is offered assistance and rehabil- 
itation in 2009, but he declines and after some months of recovery, he is able to 
resume his artistic work both as a painter, sculptor and essayist. According to him and 
his wife, he never recovers from prosopagnosia but color vision returned to normal 
after a while. 

In early spring 2013, PK accidentally falls down a flight of stairs in his home suf- 
fering a contusion. CT and MRI scan reveals only small superficial injuries and no 
new major incidents but PK is severely disoriented and the old neglect symptoms 
return in full force. Prolonged hospitalization is required due to a severe inflammato- 
ry, renal condition and the treatment seems to further aggravate the neuropsychologi- 
cal deficits. In July 2013, PK is released from hospital with severe neglect, and left 
sided hemiparesis rendering him tied to a wheelchair. 


3 Unilateral Neglect 


Neglect is a cognitive attention deficit that is defined as a failure to respond to, attend 
to, report, or orient toward stimuli presented in the contralesional side of space, which 
cannot be attributed to primary motor or sensory dysfunction [1, 2]. Space, in this 
context, should be understood in the broadest sense of the word. It includes occur- 
rences in the physical environment outside an arm’s reach of the patients (extraper- 
sonal space), the immediate surroundings (peripersonal space) and even the body 
(personal space)[3] and internal representations of body (the proprioceptive model) 
[4]. In addition to a particular spatial domain, neglect may be observed from different 
midline-frames of reference, one being viewer-centered in which the neglected area is 
positioned relative to a midline projection from the retina, the head or the torso; the 
other being an allocentric reference frame where the neglected area is positioned rela- 
tive to the stimulus or object [5]. 
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3.1 Symptoms of Neglect 


Neglect is a challenging syndrome in that it leaves the patient unaware of the conse- 
quences and effects of the impairment [6]. Patients, however, will often complain 
about bumping into things, not being able to locate objects in their homes or bruising 
the contralesional side of the body because of the inattention. The ability to read may 
also influenced in various ways either at word or sentence level [7]. The most com- 
mon behavior of neglect patients is extinction, which is the inability to detect stimuli 
presented to the contralesional side, if stimuli are presented simultaneously to the 
ipsileasonal side [8]. Extinction has been demonstrated in different modalities 
with visual, auditive or somatosensory stimuli, either individually or in combination 
[e.g. 2, 9, 10]. 


3.2. Neural Correlation of Neglect 


The diversity in neglect symptoms reflects the degree to which attention depends on 
different neural mechanisms [11] and as a consequence different types of lesions may 
trigger one or more neglect behaviors. Neglect is often characterized as being a con- 
tralesional impairment and it is more frequently observed with right hemisphere dam- 
age than left hemisphere damage [12-14]. 

The most common cause of neglect are lesions to the right posterior parietal cortex 
[15-17] but also damage to the inferior temporal region and the superior/middle tem- 
poral gyri have been found to correlate with neglect [18]. In a recent study, Verdon et 
al. [19] found that damage to the right inferior parietal lobe was correlated with per- 
ceptive and visuo-spatial components of neglect. They also found that damage to the 
right dorsolateral prefrontal cortex was correlated to impairments in explorato- 
ry/visuomotor components and, finally, that damage to deep temporal lobe regions 
was a component of allocentric/object-oriented neglect. 


3.3. Prevalence 


Neglect is a fairly common, cognitive impairment in patients with brain injury. 
Across studies, there seem to be amble agreement that neglect behavior fades ra- 
pidly, and after 3-4 weeks only approx. 8-10 % of patients will test positive for 
neglect [20]. Long-term chronicity of neglect does not seem to correlate with sex, 
handedness or lesion volume but both the severity and persistence of neglect do 
increase with age [13, 21]. Right hemisphere lesions have been measured to cause 
neglect symptoms that are more persistent and less responsive to spontaneous re- 
mission [18] and therapy [22]. The severity of the neglect behavior in the acute 
stages of injury has been found to be a strong predictor for the subsequent severity 
of symptoms a year post onset [23]. Finally, the presence of visual field distur- 
bances and defects has been shown to be more prevalent amongst patients with 
chronic neglect [23]. 


332 LL. Wilms 


4 Assessment of PK 


It is always a challenge to assess all aspects of a multifaceted syndrome like neglect. 
The cause as well as the expression of neglect may vary from patient to patient and 
symptoms fade and change over time as patients acquire some compensatory tech- 
niques such as positioning their body or head differently when solving tasks. In the 
case of PK, assessments from previous incidents had established that neglect was 
present. The current task was to ascertain the current level and to choose tests that 
would assist in the choice of training and be sufficiently sensitive to measure 
progress. For this reason, a combination of tests was used to determine the type, ex- 
tent and severity of the neglect and to distinguish perceptual from spatial neglect as 
the literature indicates a difference in effect from training depending on the type [24]. 
The choices also took into consideration that we wanted to avoid fatigue in the patient 
when administering the tests. 

Schenkenberg’s line bisection [25] was chosen to assess both perceptive and visu- 
omotor neglect. In this test, 17 horizontal lines of various lengths have to be divided 
at the middle. In the visuomotor task, the patient is asked to divide the lines by setting 
a mark. In the perceptual task, the therapist moves a pencil along each line from left 
to right and the patient indicates orally when the middle of the line is reached. Next 
used was the Mesulam cancellation tasks [26] including both the letter and the object 
cancellation tasks to assess neglect behavior. The baking tray test [27, 28] was used to 
assess spatial neglect and the computer-based Test of Attentional Performance (TAP) 
(subtests visual field test and neglect test) was used to assess visual field and extinc- 
tion and the processing speed of the perceptual system. Due to PK’s initial reduced 
performance, a special version of the TAP test was used in which the detection period 
was extended to 10 seconds per trial for the first two tests. A simple estimation test 
was used to confirm perceptual neglect [24]. Finally, picture copying of a star, a 
flower and a cube was used to test visuospatial difficulties. 

These tests have been used to assess progress throughout the training period and 
have been administered when major changes to training were instigated. The scores 
from the tests can be found in chapter 6. 

All tests indicated severe egocentric visuo-motor and perceptual neglect along with 
highly reduced processing speed and difficulties in combining visual stimuli to a usa- 
ble percept. 


5 Training of PK 


Almost immediately upon arrival at the Center for Rehabilitation of Brain Injury, PK 
was subjected to intensive physiotherapy training at least 1.5 hours a day for 4 days a 
week. He still maintains this practice 6 months later. He was mobile and out of the 
wheelchair after 3 months and is now able to walk about without support. Due to the 
intensity of the physical training, PK needed a long daily break before starting any 
other training. We discussed the requirement for intensity and daily cognitive training 
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and together with PK we decided that training at home would offer the best flexibility 
for PK. 

Apart from the neglect, the most severe problem observed in PK was the reduced 
processing speed of the perceptual system (fig. 2). We therefore chose to a bottom-up 
strategy in training to try to ameliorate as much of the basic problems as possible. No 
single treatment has been demonstrated effective for all types of neglect [29], in the 
latest report on rehabilitation from brain injury from the Danish Board of Health [30], 
an analysis based on 17 papers concludes that best effect of treatment of neglect is 
achieved through a combination of therapies. 

In 1998, Rossetti et al. published a seminal study which demonstrated that expo- 
sure to prism adaptation might alleviate some of the symptoms related to egocentric 
visual neglect in patients, regardless of the severity of neglect [31]. Internal data used 
to interpret sensory feedback from different modalities must be kept in alignment to 
ensure that action and attention are directed towards the same location [32]. Rossetti 
et al. hypothesized that the visuomotor realignment of the internal representation of 
the personal midline observed in standard prism exposure studies might alleviate 
symptoms of neglect. Prism Adaptation Therapy (PAT) has since become one of the 
most promising therapies in the treatment of egocentric visual neglect [33-36]. 

Since PK had shown visuomotor problems, we decided to start up with PAT twice 
a day for two weeks. The author provided a computer-based prism adaptation system 
for the purpose of training and follow-up. In this version of computerized PAT, the 
patient performs three training sets at each of the two daily sessions. In the first set, 
the patient performs 30 pointing trials on a touch monitor, 10 trials at each of three 
locations with no visual feedback. This set measures baseline performance at the ses- 
sion. In the second set, the patient performs 90 pointing trials, 30 at each of three 
locations this time wearing prism goggles. The goggles cause a deviation of visual 
input 10 degrees to the right. At the end of each trial, the patient receives terminal 
feedback (seeing his fingertip when touching the monitor) and is asked to attempt to 
adjust to the deviation. In the final set, the patient removes the prism goggles and 
performs an additional 60 pointing trials, 20 at each of the three locations again with- 
out feedback. The aftereffect from the prism exposure is measured to determine if 
adaptation is taking place. Data is collected and stores at the computer for each trial, 
set and session for further and later processing. PK could not administer PAT training 
on his own so helpers and the spouse were trained by the author to assist PK during 
the two weeks of training. 

As PK had also demonstrated perceptual neglect problems and reading difficulties, 
it seemed appropriate to try computer-based optokinetic stimulation, in which patients 
are asked to attend to targets on a background moving towards left [37-39]. The sys- 
tem EyeMove from www.medicalcomputing.de was chosen based on the documented 
results [40, 41]. Rather than using the preset versions for training, we started out with 
a single dot moving towards the left at three preset speeds. After a week, the speed 
and size and number of moving objects were adjusted to ensure that PK was practic- 
ing at the limit of his ability. PK trained once a day for 45 minutes for three weeks 
and after 6 weeks, PK managed to train at the highest level of difficulty. After the 
first three weeks, we added a picture naming task using the computer-based system 
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“Afasi-assistant” from www.afasi-assistent.dk where the task was to read a word 
and find the matching object amongst first 2 and later 4 pictures. In November, 
we added cancellation training using the iPad APP “Visual Attention” from the 
suite TherAppy from the company www.tactustherapy.com. Table | summarizes the 
training schedule. 

Once a week, the training regimen was adjusted by the therapist. On a daily basis 
the spouse or hired helpers would assist PK in the starting the appropriate application. 


Table 1. Training regimen at home. Training was adjusted weekly to constantly challenge the 
ability of the patient. 


Type Period Intensity 

Test 1 

Prism Adaptation Training 2 weeks 2 x 30 minutes, daily 
Test 2 

Prism adaptation Training 1 week 1 x 30 minutes, daily 
Optokinetic training 1 3 weeks 45 minutes, daily 
Test 3 

Optokinetic training 2 Ongoing 45 minutes, daily 
Therappy Visual Perception 4 weeks 15 minutes, daily 
Afasi-assistent, object determi- 

nation 4 weeks 20 minutes, daily 
Test 4 


PK has since continued to practice with the optokinetic system every morning as 
he feels that it “warms” up his perceptual system and further reduce the perceptual 
effects of neglect for a period of 30-60 minutes after practice. 


6 The Result so Far 


As can be seen in table 1, PK was tested before and after each major change in train- 
ing. The results from the line bisection tests before and after the training have been 
listed in figure 1. PK’s scores are vastly different in the two tests, which is indicative 
of separate systems being activated in the bisection task[42]. PK improved on both 
tests after PAT (test 2) and on the perceptual part after the optokinetic training (test 
3). However, test 4 indicates that the effect has not been stable although PK is still 
improving but at a slower rate. 

The cancellation tasks (table 2) show some improvement at Test 3 but at Test 4 the 
effect to the left has disappeared. 
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Linebisection Test Results 


50 = Motor Average dev. 
from center % 


fl Perceptual Average 
dev. from center % 


Dev. from center in % 
& 


49 |-Test1 Test 2 Test 4 


Fig. 1. The results from the line bisection tasks. “Motor” indicates the result where PK set the 
mark with a pencil and “Perceptual” is where the therapist sets the mark on PK’s request. 


Table 2. Results from the Cancellation tasks over time 


Figure Letter 
Upper Lower Upper Lower Upper Lower | Upper Lower 
left left right right left left right right 
Test 1 N/F N/F N/F N/F N/F N/F N/F N/F 
Test 2 N/F N/F N/F N/F 0 0 1 5 
Test 3 2 1 6 4 1 0 5 8 
Test 4 0 0 7 7 0 3 7 7 


The baking tray test improved dramatically after the PAT (table 3) and at the most 
recent test, all 16 “buns” were spread out equally across the “tray”. 


Table 3. The results from the baking tray test 


Left Right Comment 
Test 1 ) 16 
Test 2 8,5 7,5 Skewed right 
Test 3 7 9 Still skewed towards right 
Test 4 8 8 Spread all over the plate 


The TAP test was used in an attempt to establish whether the visual field was in- 
tact. It also provides data on processing speed by measuring the time from stimuli 
onset until button activation by the patient. Albeit a rough estimate, it is still a good 
indicator for overall processing speed of the perceptual system. The results over the 
training period are shown in figure 2. 
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Fig. 2. The change in response time to stimuli presented during the TAP test. Changes between 
Test 1, Test 2 and Test 3 were tested highly significant (F3 69=35.1, p < 0.005). 


Reading ability was monitored by administering reading tests. At Test 1, PK was 
not even able to read two letter words. At Test 2, PK still had trouble even reading 
single words. After the optokinetic training, PK was able to read poetry again and 
shorter pieces of prose (Test 3). However, he still has some problems keeping track of 
the lines losing the position in the text lines and has to use his finger or a ruler to keep 
track. 

PK’s ability to draw from memory has been intact for almost the entire training pe- 
riod. The performance on copying of drawings has improved so PK is able to copy a 
star, a cube and a flower. Introduction of new drawings in test 4 did, however, show a 
bias to the right in one out of three drawings. 

The most encouraging improvements so far has been in relation to PK’s work as an 
artist. He has been able to resume his work as an artist and the most recent improve- 
ment has been intermittent periods of resumed color vision and absence of neglect 
when painting. Previously, he was unable to leave his work for just a little as he was 
unable to recognize his work from visual input alone. Although he still cannot recog- 
nize older pieces of work as his own, he is able to return to current work in progress 
and recapture how far he is using visual cues from the painting. We will keep moni- 
toring overall progress for the next 6 months. 


7 Discussion and Conclusion 


In this study, we tested if training of severe neglect could be accomplished by setting 
up training systems at the home of a patient with no previous experience in the use of 
computers. 

The first obstacle was the need for assistance in initiating the daily training of 
PAT. Although the program could be started with one click, moving boxes back and 
forth and putting on prism goggles required assistance from the spouse and local hel- 
pers who had to be trained in the execution of PAT. It was fairly expensive, but it did 
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allow PK to train as intensive as required. The optokinetic training was much easier to 
use although it was impossible to start the program without having to go through sev- 
eral menus. Being unable to read, PK was unable to start the program by himself for 
many weeks and had to have assistance from helpers. The Afasi-assistant could be 
setup to start with only one click and so could the TherAppy APP on the iPad. 

PK was and still is extremely motivated for training. He has meticulously trained 
almost every day and been good at stating when training became too easy or required 
adjustments. Adjustments to the programs were done at a weekly basis by a visiting 
therapist (the author) and this worked well for all parties. The visit at home provided 
an opportunity to observe and respond to changes and improvements in activities of 
daily life. When asked about the advantages of being able to train at home, both PK 
and his wife stated that above all, the flexibility of being able to train when time and 
strength allowed it was very important for keeping up motivation to train. The disad- 
vantage was the requirement for hired help. Using the spouse as assistant trainer was 
not a success and created marital conflicts and aggravation to the disappointment of 
both parties. The reason for this is currently being investigated and be dealt with in a 
subsequent paper. 

Often, patients have to practice once or twice daily for 2-5 weeks and the training 
needs to be adjusted frequently as the function and processing speed improves. It has 
been pointed out many times that computer-based training offer solutions to these 
challenges and the advance of AI algorithms and online profiling will eventually alle- 
viate adjustment challenges. However, even fairly simple computer-based training 
like PAT and optokinetic training will require assistance to start up the programs, 
adjust the equipment and monitor the progress of the patient. 


Acknowledgement. I wish to thank Doctor Georg Kerkhoff for valuable advice and 
recommendations in the setup and execution of the EyeMove system. 
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Abstract. This paper analyses the use of augmented reality in advanced project- 
based training in design. Our study considers how augmented environments can 
contribute to this type of group training: what types of interaction spaces con- 
stitute these new learning environments and how are these spaces constructed 
so as to promote collective reflection ? 


Keywords: Project based learning, collaborative design, augmented reality. 


1 Context and Hypotheses 


We propose to study the use of Augmented Reality (AR) in advanced project-based 
training. Following the taxonomy proposed by Dubois [1], AR is considered here to 
be the interactive and non-immersive real-time superposition of virtual information in 
a real environment. The aim of project-based training in this context is to develop the 
learners’ general and specific skills to devise complex projects in design, architecture 
and engineering [2]. In the present study, this type of training will be implemented 
through group activities based on group dynamics. 

Indeed, the contemporary designer no longer works alone on projects; rather, he or 
she collaborates with other experts because projects must evolve in a regulatory 
framework, integrating progressively more coercive qualitative demands with shorter 
and shorter deadlines [3]. It is therefore impossible today to speak of initiation at the 
origin of complex projects in design, architecture and town planning without training 
students in collective activities. This is why we hypothesise that group activities pro- 
mote learning through collective reflection on a concrete project, as such activities are 
well-adapted to the integration of the knowledge and skills required to master com- 
plex design. 


2 Research Question 


The aim of this paper is to understand how augmented reality supports group dynamics 
in project-based training. In other words, what types of interaction spaces constitute 
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these new augmented learning environments and how are these interaction spaces con- 
structed so as to promote collective reflection? 


3 Scope: Instrumenting Collaborative Practices 


Our teaching approach is to develop specific skills in collaborative practices, which 
are clearly distinguishable from so-called cooperative activities. We consider this 
distinction to be important because it implies the establishment of a specific pedagog- 
ical framework, and it already exists in the definition of collective activities in general 
(without mentioning the notion of design [3, 4, 5]). Despite the diversity of defini- 
tions, all agree on one major characterization: the differences between the tasks as- 
signed to the project participants. As the common goal is the project, the designers are 
only required to work together because of the need to access shared resources held by 
individual parties. Bearing these definitions in mind, it is our opinion that design as an 
activity includes solitary work as much as it includes cooperation or collaboration. 
The present article will only tackle the moments in which students, teachers and ex- 
perts work together and share the real-time annotations and graphic documents neces- 
sary for design. 


4 Integration of Augmented Reality in Training Students 
in Collaborative Design 


The application of augmented reality as it is understood in the present study therefore 
concerns the real-time projection of virtual documents onto actual work surfaces 
(tables and boards) and the creation, manipulation and annotation of those documents 
using electronic pens. Such an application is here linked to how work can be shared 
via a network and it involves students, trainers and experts working together - remote- 
ly and/or in co-presence - on an (architectural) project. It is implemented in specific 
spatial configurations and the whole is therefore covered under the title SAR - Spatial 
Augmented Reality. 


4.1 _ Presentation of the Tool used in Collaboration 


Our study concerns four different SAR. All four are based on a software solution 
called SketSha - for "sketch sharing" - developed by LUCID, University of Liége [6]. 
SketSha is based on the metaphor of a meeting whereby several people are gathered 
around the same document. In addition to the social exchange between collaborators 
via videoconferencing, SketSha enables the participants to share annotations and 
graphical documents both in real-time and remotely. Concretely, this involves the 
connection, via internet, of several digital surfaces on which users interact graphically 
with an electronic pen. 
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Fig. 1. Collaborative system SketSha 


4.2 Introduction to the SAR and their Usages 


Four SAR were installed as part of the training in collaborative design for students of 
architectural engineering at the University of Liége. The workshop was attended by 
about fifteen students and lasted three months during which time the students collabo- 
rated in real-time with students from the School of Architecture of Nancy. In these 
workshops, the students were also able to benefit from consultation and advice from 
remote experts (architecture, structure, building engineering, fire safety, etc.) thanks 
to the various SAR which had been installed as part of the course. 


SAR 1 Consultation. The first SAR allows individual students to consult various 
remote experts for help in developing their own project. At this meeting, students are 
asked to prepare documents and hierarchize the information that they would like to 
communicate to the expert relative to any questions they have. The pedagogical aim 
of this SAR is to prepare the students to deal with other expertise and skills while also 
providing access to other knowledge, references and experiences. 


SAR2 Collaboration 


aa 


SAR3 Project Review SAR4 Evaluation 


Fig. 2. 4 SAR for collective work 
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SAR 2 Collaboration. The second SAR brings together two geographically separate 
groups of students for weekly sessions to work on the same project. At these sessions, 
two or three students seated around a large graphic table collaborate remotely and in 
real-time with two or three students from the School of Architecture of Nancy. The 
pedagogical objective of this SAR is to initiate students in co-ordination (as regards 
public speaking and joint production of annotations and graphic documents) and in 
sharing their own opinions in which they explain, negotiate and justify their choices 
so as to encourage new collective ideas. 


SAR 3 Project Review. The third SAR is used to review the project in co-presence 
between students, trainers and experts. At these meetings, the students are asked to 
display their project on an interactive board and interact with the rest of the class 
throughout the presentation. The expert and the teacher share the same annotated 
document, but theirs is projected onto a graphic table. The aim of this SAR is for the 
teachers, experts and students to share opinions in real-time and to enable each party 
to interact either orally or through drawing. Each individual’s project thereby evolves 
through collective reflection which, in our opinion, helps to reduce competition be- 
tween students. 


SAR 4 Evaluation. The fourth SAR enables collective evaluation of student projects 
by various co-present and remote experts interacting in real-time. Here, the student is 
asked to present his or her project to the class but also to the remote experts whom 
they have already met on two previous occasions in the context of SAR 1. SAR 4 
enables all participants to intervene on the final graphic documents of the student’s 
project. The major objective of this SAR is to transform the documents presented — 
supposedly frozen images at a final jury — into working documents to stimulate col- 
lective reflection and the emergence of new ideas. 


5 Methodology 


5.1 Collection of Data 


Longitudinal observations were conducted for each SAR; in other words, observa- 
tions made over the course of several sessions on how students appropriate the tool 
and prepare the design project. Only the teachers were aware of the objectives of the 
observations: to define the involvement and contributions of the various SAR in the 
student’s learning process in design collaboration. 

In the definition of our protocol, we considered it vital to vary the parameter of num- 
ber of actors as several publications focusing on collective activities (like [7]) show that 
the performances of the interactions between the designers as well as the decisions 
made can be influenced by the number of actors taking part (directly or indirectly) in the 
design. This is why the experimental protocol was defined in such a way as to use the 
same augmented system in the same class of students for collaboration over the same 
lapse of time while varying spatial configuration and the number of participants. All our 
observations were recorded using an audio/video recording device. 


Spatial Augmented Reality in Collaborative Design Training 347 


Table 1. Total of data accumulated in the four SAR 


SAR 1 SAR 2 SAR 3 SAR 4 
Configuration| Remote Remote Co-presence Remote 
1 student 3 students 1 student 1 student 
Participants 1 teacher 3 remote students | 3 experts/teacher | 7 experts/teachers 
4 experts - Alés_| 2 experts/teachers |16 students (public)}16 students (public) 
Students 17 5 groups x 6 17 17 
Sessions 2 x 0:30 7x 0:30 2 x 0:30 1 x 0:30 
Durations 17:00 17:30 17:00 8:30 


5.2. Data Analysis 


Collaborative design can be analyzed from various points of view: (1) physical work- 
ing conditions, (2) the emotional or psychological aspect and (3) cognitive. This final 
point of view - analysis of the design process relative to the situation, actors and the 
subject being dealt with - is the one we will be focusing on in this study by consider- 
ing group awareness, intermediary objects and the common referential used to study 
these situations. 

This is why the discussions, annotations, imported documents, use of the tool and 
use of the different SAR by all participants are observed and analysed qualitatively 
using a specific coding scheme (fig. 3). This scheme distinguishes between three 
types of interaction spaces according to how the actors use them: 


1. We-Space: in which remote participants annotate and modify a shared real-time 
document using the electronic pen; 

2. I-Space: in which the actor works on his or her document alone; 

3. Space-between: a private conversation in which certain members of the group iso- 
late themselves to work together independently of the We-Space in which they are 
participating. 


These spatial interactions are first of all described as actions relative to various para- 
meters which are mainly: 


— active actors: this group is made up of all the participants of the SAR under analy- 
sis and they are coded as being active when they explicitly intervene in the ob- 
served situation; 

— documents: these are categorized according to how they are shared. If it is shared, 
is it with the participants as a whole or solely with one collaborator in a private 
conversation? 

— action typology: these define the objective of actions such as isolating oneself, 
pooling information, challenging, acting on a decision or giving instructions, eva- 
luating, producing together, negotiating or formulating group’s rules, and so on. 


These spatial interactions are then studied and analyzed by regrouping them into 
sequences to illustrate conversational dynamic between collaborators. We believe that 
the sequence designates a series of successive choices which form a narrative unit in 
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PARTCIPANTS 


DOCUMENT COLLECTIVE ACTION TYPE 


Space 
ISpace-Between 

Doc. personnel 

Doc. partagé par tous 
Doc. partagé en aparté 
sisoler 

Metre en commun 
Remettre en cause 
lActer une décision 
commenter 

Produire ensemble 
INégocier 


# TIME DURATION ACTION 
71 1425 3 Tetadiant présente son projet et met on commun 
Trensembie des informations concernant le projet 3 E 
m7 
8 2 1428 11 __Puldlant continue a exposor le projt pendant que 2 
g ses exaerts ds mt 1 1 x0 E x0 
Bs 1429 2 Metudiant surigne le faux plafond de s@ coupe pour 
2 E 
4 1431 2 
2 2 xP XD. wo: ie 
5S 14:33 1 re entation de gon projet 
1 xP xP OE 
6 14:34 2 
2 E 
7 14:36 2 Metudlant change de calque passant 2 un plan surtequel & 
‘expose sa structure en dessiant les poutres en rouge et 2 2 xD E xD 
les ir que les experts 
8 (14:38 2 étudiant passe @ une de sa fagade sur 
laquelle il surigne en cf ouleurs les éléments 2 E 
qui composent ce détail if 


Fig. 3. Example of the coding used to analyze the data: SAR 4 "Evaluation". 


response to general questions (and/or proposals) raised by the actors during the design 
process. This sequential empirical division refers to logic behind the actions shown by 
the transition from one problem to the next and/or by the transformation from one 
state to another — a process in perpetual movement [8]. Stating a new problem (and/or 
a proposal, and/or a question) is what marks the end of one sequence and the begin- 
ning of the next. 


6 Results: Towards a New Classification of Intermediary 
Spaces 


Based on Johansen’s proposed spatiotemporal matrix in the domain of CSCW for 
groupware classification [9], the observation of the SAR implemented in our training 
environment requires a review of the "co-presence/remote" dichotomy in synchronous 
collaboration. 

Even if the tool used in these SAR was originally designed to support remote col- 
laboration between the various protagonists involved in the project, in presence public 
evaluation (SAR 3) actually revealed itself to be the most interesting application. 
Indeed, we noticed the emergence of “augmented presence” between the student us- 
ing the board (explaining his or her project) and those using the digital table (the ex- 
pert and the teacher discussing the project with the class). Interaction was established 
between the actors as a whole, based both upon a direct and in presence modality (a 
conversation within the same physical space) and remote indirect modality (the anno- 
tation of a virtually shared document on physical supports situated in the same room 
but distinct from one another). This SAR therefore involves new intermediary spaces 
and also nuances Johansen’s notion of spatiality, distinguishing between real presence 
and augmented presence as well as virtual co-presence, in the synchronous activity of 
collaboration [10]. 

And so, as regards spatial configuration, these SAR influence the relationship the 
actors have with these intermediary work spaces which are of a short duration, and 
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which are created as a function of the needs, aims, negotiation process, justification 
and consensus building in project design. These co-spaces vary, change and evolve 
between the actors’ personal spaces (I-Space), co-work spaces which bring the actors 
together (We-Space) and the junction between the two (Space-Between). These 
changes enable students, teachers and experts to participate in building joint reflec- 
tions so that all the actors may evolve together towards the same objective. 

The collaborators/learners who are working together on the same project and shar- 
ing a We-Space (SAR 2) show a tendency towards working as a unit to assure cohe- 
rent choices and interdependence between the different elements making up each 
individual’s reflections. The learner, showing his or her project to the other students, 
experts and teachers and thereby making the transition from I-Space to We-Space 
(SAR 3), has a new perspective on his or her own production while benefiting from 
the collective reflections of the other participants. Learners, in appealing to experts 
from other domains to nurture and answer their queries (SAR 1), enrich the pool with 
opinions, knowledge and references linked to the project. At the final jury (SAR 4), 
private conversations develop between the learner and the students, and occasionally 
even the expert in co-presence (Space-Between), thereby creating a situation where 
the project as an end-product is discussed and challenged. 

All these intermediary spaces therefore involve the mechanisms implicit in mutua- 
lizing knowledge, sharing comprehension and the cognitive synchronization relative to 
building mutual awareness of the working environment (social awareness), the design 
(activity awareness) and the tasks and contributions of each person within a group 
(action awareness) [11]. These mechanisms and their links to the intermediary spaces - 
We-Space, I-Space and Space-Between - are detailed in the following paragraphs. 


6.1 We-Space 


The SAR as a whole principally encourage this intermediary space: the co-work 
space. This space was analyzed using the coding table and more specifically, the ac- 
tions, the documents brought into play, and the relationships maintained between the 
actors. It is nevertheless important to highlight the fact that the project evolves princi- 
pally through speech, even though these words are often put into drawings to explain 
and justify these choices and find a consensus between the actors as a whole. These 
drawings are collectively manipulated in presence and remotely. They constitute a 
shared interactive boundary object that evolves from a negotiation process and con- 
sensus building between students (SAR 2), experts (SAR 1) and trainers (SAR 3&4). 
These artefacts promote collective reflection on the productions which generate new 
shared representations (especially in SAR 2). Moreover, they translate the student’s 
individual design project, thereby enabling him or her to view the project from a dif- 
ferent perspective (I-Space) and construct his or her own speech as well as new inter- 
pretations and reflections between the other learners (Space-Between) who are 
present at the public evaluation (SAR 3) and the jury (SAR 4). The SAR in which 
these interactive intermediary objects are manipulated therefore reduce the spatiotem- 
poral gaps brought into play because the interval of time between the action made 
on the document and the information feedback to the various users is immediate. 
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The We-Space is particularly encouraged because it does not involve any loss in the 
causality links between what is said and the annotations created by the actor — in ei- 
ther augmented presence or virtual co-presence — and what the other actors receive as 
regards information. 

Consider the case of remote collaboration between ULg students and students from 
the School of Architecture of Nancy (SAR 2). New modes of exchange were observed 
in this scenario: one which brings out the need to create one’s own I-Space from a 
We-Space and one which shows the possibility of creating a joint drawing with two 
people, thereby emphasizing the We-Space. In both situations, graphic representations 
created by two people were brought into play to work on the object being designed. In 
the first case, the students individually propose different points of view of the project 
under discussion by dividing the We-Space into two and thereby creating two I- 
Spaces. One student draws an interior view of the project while the second draws a 
cross-section; both are able to see the plan they have previously discussed. As all 
participants see what the others are doing, the student drawing the interior view can, 
without speaking, readjust his or her sketch while looking at the cross-section being 
simultaneously constructed by the remote collaborator. This juxtaposition of represen- 
tations developing on the same shared digital interface encourages cross- 
interpretation from both actors, even if their initial intention was for each to have his 
or her personal working space. In the second scenario, the students both draw the 
same perspective of the plan that has been discussed and collaboratively worked on 
beforehand. In this way, they pool the choices previously made without even having 
to speak to one another. 


6.2 I-Space 


The SAR were implemented so as to promote the We-Space. They were not particu- 
larly designed so as to enable independence. Independent work is synonymous with 
isolating oneself and designing independently while taking into account the work 
done by the others [9]. Visser also introduces this notion when she speaks about “pa- 
rallel activities” and their importance in collective work. In her opinion, these parallel 
activities are marked by interruptions and recoveries throughout the process. They 
"constitute an indication of the place ‘individual conception’ can occupy in a co- 
design meeting between architects" [2, p.152]. 

In our opinion, independence marks the time spent working in isolation; these are 
the moments when the project is thought out independently while still taking the opi- 
nions of the others into account. Thus, the need to “create one’s own I-Space” 
emerges, because work involving several people does not systematically lead to con- 
tinuous interaction. Even if the students, trainers and experts tend to reflect together 
on how the project will evolve, our analyses show that each individual thinks inde- 
pendently and constructs their own isolated reflection at given moments. Even the 
students/audience take note during the presentation at the jury because they are stra- 
tegically readjusting what they are planning to say and their answers relative to the 
remarks made by the experts and teachers. Often, this I-Space is marked by: 
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- moments of silence during which the actors isolate themselves, either to produce 
notes and/or personal graphic sketches, or simply to read data or look for a reference 
to illustrate an idea. In this situation, certain I-Spaces enable continued joint reflection 
while others interrupt it so as to enable the actors to reflect upon a new point of view; 
- personal spoken comments, made in an undertone and not intended to be shared 
but which result nevertheless from collaborative situations. The I-Space becomes, in 
this case, a Space-Between which allows certain actors to create private conversations 
in a collaborative activity (cf. 6.3). 

Yet this I-Space is not supported by the SAR installed in our training programs. 
Here, participants in the various sessions do not have the possibility to create their 
own individual graphic space (I-Space) unless they use the tool differently, or create a 
personal space which is independent of the SAR concerned, by use of a personal 
notebook, for example. As previously seen in the SAR 2 We-Space, certain collabora- 
tors find a solution by geometrically dividing the space into two parts. This juxtaposi- 
tion of I-Space within a We-Space encourages cross-interpretation followed by a 
pooling of this information and the confrontation of the two proposals, thus leading to 
new cross-interpretations. Once these independent actions are completed, new reflec- 
tions emerge which are often then pooled to be shared with other participants. An 
example of this is when an expert takes notes during a student’s presentation and then 
shares the comments with said student so that they may work together to develop a 
joint reflection on the project. 

Today, we believe that it is important to develop the SAR so that I-Spaces can be 
formed. Indeed, even if the SAR do not prevent the participant from creating his or 
her own private notebook by using a pen and paper outside the shared interface for 
example, the SAR nevertheless oblige said participant to change the tool’s function 
by installing a private work methodology so as to construct a personal work space. 
This I-Space is all the more important in collaboration because it allows the partici- 
pant, the student in particular, to refocus on his or her own perceptions and individual 
interpretations. These choices and personal interests are more often than not defined 
by dividing the tasks according to the needs and/or interests of each individual in the 
group (particularly in the case of SAR 2). 


6.3 Space-Between 


The Space-Between is a private conversation that is created between two or several 
participants independently from the rest of the group. Like the I-Space, the Space- 
Between is not managed by the SAR, especially not between remote actors. It is all 
the more problematic because the Space-Between is principally based on oral ex- 
change. In SAR 2 and SAR 4, the creation of a Space-Between actually disturbs the 
collaborative process rather than nurturing it, as in group communications and via 
video-conferencing, all sounds emitted from one place (near, far, low voice, high 
voice) are heard with the same tonality by the geographically remote participants. 

Yet collective activities develop from social interactions, amongst other things, 
whether they are collective or private. Private interactions gain particular impor- 
tance in the case of collective activities which present a well-identified hierarchical 
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relationship between actors. As the tool enables the participant to draw synchron- 
ously and remotely, remote actors can intervene peer-to-peer (SAR 1&2). All other 
players attending the project review may also modify their drawings on the basis of 
the examiner’s corrections (be the examiner an expert or a teacher). 

In giving this possibility to all participants to make adjustments to the document, 
the modifications made by the teacher become less sacrosanct, and this encourages 
the challenging of choices made, regardless of the participant involved. Where a clas- 
sic review of a project places learners alone with the teacher with no possible interac- 
tion with their classmates, the SAR encourages exchange between students and limits 
competition. Sharing points of view, building common operative referentials, finding 
common ground and cognitive synchronization are thereby encouraged by the SAR, 
enabling the We-Space, but also managing the Space-Between. These SAR contribute 
to building common ground relative to the project, and perfectly managing activity 
awareness. In contrast, they do not allow social awareness (because it is difficult to 
know what is happening behind the screen and where such and such a background 
noise is coming from) and action awareness (because the SAR do not hint at the spe- 
cific characteristics of each participant and their tasks in the collaborative activity). 
This is principally due to the impossibility of creating a Space-Between, particularly 
in the case of SAR 2 and 4. 


7 Conclusion : Towards Articulation between Intermediary 
Spaces 


As observed in the previous paragraphs, it is impossible to separate the intermediary 
spaces. The We-Space, I-Space and Space-Between as a whole define the co-work 
space between the actors. Nevertheless, adjusting between these diverse intermediary 
spaces involves flexibility that should be provided by the tool, so that each individual 
may easily manipulate and structure his or her interface. This flexibility is currently 
only partially managed by the system used in the SAR presented here. SketSha, a 
piece of software enabling synchronous sharing of graphic annotations, was initially 
designed for the pooling of synchronous work on documents, but video-conferencing 
does not enable the actors either side of the screen to isolate themselves and create 
their own private conversations. Our qualitative analysis also confirms our former 
results [10] : the SAR participate perfectly in group cohesion by creating intermediary 
spatialities between augmented presence and virtual co-presence. They aid and equip 
the student in learning how to collaborate. They encourage peer-to-peer sharing be- 
tween learners, trainers and experts, but at the expense of independent work and the 
creation of private conversations. Moments of isolation enable interlocutors to ex- 
press their ideas using their own knowledge, references and contexts and they tend to 
bring together a certain number of singularities, those of: the student, the other partic- 
ipants, the project to be designed and the tool being used. The designer/student must 
therefore juggle between these singularities by personal interpretations constructed in 
the I-Space and the Space-Between that he or she gradually appropriates as the colla- 
boration process progresses. 
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In this way, the next generation SAR will also have to enable users to easily move 


between the shared work space (the We-Space as it stands today) and a personal or 
private space according to the needs and contexts of the collaborators. Articulation of 
these intermediary spaces within the SAR is currently at the pre-test stage, using tech- 
nical solutions based on the use of individual graphics tablets placed upon the shared 
table. This will better support cognitive synchronization in the co-actors as it will be 
promoted by the flexibility of access to the other intermediary augmented spaces. In 
turn, this will promote comprehension of the complex activity that is design. 
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Abstract. This paper introduces the concept of escape and evacuation from 
passenger ships from a perspective of ship design and risk management. As part 
of that process, the use of computer simulation tools for analysing the 
evacuation performance of ships carrying large numbers of persons on board is 
becoming more relevant and useful. The objective of this paper is to present the 
pedestrian dynamics simulation tool EVI, developed to undertake advanced 
escape and evacuation analysis in the design verification of cruise vessels, pas- 
senger ferries and large offshore construction vessels, among others. 


Keywords: Evacuation analysis, passenger ships, offshore vessels. 


1 Introduction 


Innovation in ship design has traditionally been a feature of the cruise and ferry sec- 
tors of the maritime industry. The design of passenger ships has evolved dramatically 
during the past 30 years, driven among others, by increasing customer expectations, 
business opportunities, technological progress and societal demands for increased 
safety and environmental greenness. The single most significant trend is the growth in 
ship size, with the largest cruise vessel today being able to carry more than 5000 pas- 
sengers on-board (some 8,400 people including the crew), and measuring more than 
350m in length. 

Another trend in the industry has been fuelled by the emergence of offshore con- 
struction, which has led to the development of a new type of working vessels with the 
capacity to carry and accommodate large number of special personnel (workers) on 
board. These vessels, referred to as Special Purpose Ships (SPS), may be subject to 
the same rigorous design verification as large passenger ships when the numbers of 
persons on board exceed 240. 

Safety is arguably the single most significant design driver for passenger ships to- 
day with safety requirements now driven by explicit safety goals and include quantita- 
tive verification of residual capabilities in case of accidental events. Those capabili- 
ties relate to stability after flooding extensive fire protection, redundancy of essential 
ship systems (in line with the ‘safe return to port’ philosophy) and ultimately escape 
and evacuation arrangements — the last safety barrier if everything else fails. 
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Given this level of significance, validation of escape and evacuation arrangements is 
gradually taking a more prominent place in the conceptual ship design iteration and 
verification process. To this end, following initial developments at the University of 
Strathclyde in the late-1990s to support the rule-making process, the focus at Safety at 
Sea since 2001 has been clearly on ship design/operation support. Initially, the software 
was designed to undertake advanced evacuation analysis for Ro-Ro passenger vessels in 
accordance with the guidelines developed by the International Maritime Organization 
[1]. More recently, the software has been used as a consequence analysis tool during the 
conceptual design risk analysis of large passenger vessels, offshore platforms and spe- 
cial purpose ships (pipe layers, drilling ships, crane vessels, among others). 

A brief overview of the ship-evacuation problem is presented in Section 2 with 
emphasis on the many factors that influence the process of ship evacuation. In Section 
3, a general description of the key features of the EVI simulation model is presented. 
These key features represent the concept and implementation of the solution to the 
problem defined in Section 2. 

The paper concludes in Section 4 with some practical observations based on the 
experience gained from the use of the tool in a number of commercial applications 
and design projects. 


2 Ship-Based Evacuation Problem 


The ship evacuation process has a number of aspects which influence the outcome of 
a ship evacuation and therefore have to be taken into account when trying to simulate 
and analyse the process. A brief overview of these factors is given in the following. 


2.1 Emergency Scenarios 


A ship may need to be evacuated in an emergency if the risk to the persons on-board 
is deemed to be unacceptable. For the majority of ships, emergency scenarios requir- 
ing ship abandonment may be associated with shipping accidents, such as colli- 
sion/grounding leading to flooding, fire or explosions. A generic procedure, referred 
to as ‘muster list’, for dealing with an incident is illustrated in Table 1. As it can be 
noted, the process of evacuation is normally carried out in stages. In each stage, there 
might be different activities occurring concurrently but having different objectives. 


Table 1. Generic (typical) muster list for a passenger ship 
STAGE 1 STAGE 2 STAGE 3 
INCIDENT (2) Damage control 


(3) Muster of Pax 
1) Detect Al 
(1) Detection & Alarm (4) Preparation of LSA 


(5) Abandon Ship 
(6) Rescue 


The incident itself (e.g. fire, flooding) might physically impact on the evacuation 
arrangements. This impact can include the following: 
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e Impairment/inaccessibility of escape routes, muster areas or evacuation systems 
(e.g. due to damage, heat, smoke or floodwater); 

e Heel and/or trim of the ship (due to flooding), leading to inclination of the surfaces 
used as escape routes; these may slow down the movement of evacuees or stop 
them altogether. Severe inclinations (more than 20 degrees) can prevent the occu- 
pants from deploying evacuation systems. 


2.2. The Ship Environment 


The ship purpose determines the internal layout of the ship. The layout is a complex 
collection of spaces of different use, distributed along horizontal decks and vertical 
fire zones. The function of the spaces varies greatly from ship to ship: 


e Passenger Vessels: Layout includes a variety of public spaces (such as restaurants, 
theatres, shopping malls, lobbies, sun decks, bars, discos, casinos and many oth- 
ers), cabins and crew service spaces (machinery, galleys, hotel services, etc.) 

e¢ Offshore Vessels: Layout includes a variety of spaces in the living quarters (cabins, 
recreation spaces, meeting rooms, offices, control rooms, etc.), working stations for 
special personnel (pipe manufacturing stations, crane workstations, working decks, 
etc.) and marine crew service spaces (machinery, workshops, stores, etc.). 


The geometrical and topological features as well as the different functions of spaces 
within a ship will greatly influence the location of the evacuees at the moment of the 
incident and in some cases, the awareness and/or the response time of the occupants. 
For example, people in cabins may be asleep, people in working stations (e.g. weld- 
ing, heavy lift cranes) may be subject to a delay due to safe termination of work re- 
quirements. 


2.3. Escape and Evacuation Arrangements 


Escape and evacuation arrangements can be considered as risk control measures or 
barriers aimed at mitigating the severity of the consequences of an accidental event. 
These measures are mainly of passive nature and include the following: 


Alarm Systems. Public address and alarm systems are the means of communicating 
an emergency signal to all persons on-board. This will influence the time for people to 
become aware of and respond to the incident. The General Alarm signalling the order 
to muster is typically activated by the crew once the incident is validated; 


Escape Routes. These comprise hatches, doors, corridors, stairs, walkways, ladders 
and other spaces, connecting all spaces on-board to a muster area or a safe refuge. 
Most spaces on-board ships are fitted with at least two emergency exits. All exits lead 
to a primary and a secondary escape route to a muster point. The capacity of the es- 
cape routes is generally driven by the width of the escapes and the redundancy of the 
routes from different areas of the layout. 
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Muster Areas. These are spaces that can be located internally (public areas) or exter- 
nally (near embarkation stations). The capacity and specification of muster areas va- 
ries significantly from passenger ships to offshore units/vessels. For passenger ships, 
at least 0.35 m” per person has to be provided (e.g. 500 persons, a minimum of 175 m* 
of deck space has to be provided in the muster area). 


Lifesaving Systems. These comprise survival craft (e.g. lifeboats) and other systems 
to assist in the abandonment of the ship. These systems have to be prepared before 
use (if not stowed in the embarkation position) and are usually located near or by the 
muster areas. The capacities of these systems vary from ship to ship. Typically, life- 
boats for up to 150 persons are fitted to most passenger ships. Recently lifeboats with 
capacities up to 370 persons have been developed. The arrangement of survival craft 
can significantly influence the procedures and time of ship abandonment. 


2.4 Human and Organisational Factors 


Number of Persons on Board (POB). The number of POB depends on the purpose 
of the vessel/offshore unit and the operational mode. A typical cruise vessel carries 
about 4000 persons (including crew). Offshore construction vessels may carry up to 
600 persons. 


Demographics. The demographic characteristics (age, gender, etc.) of the evacuees 
would greatly influence the walking speed and the reaction time to an alarm. The 
demographics differ greatly between passenger ships and offshore working vessels. 
Whilst on passenger vessels the sample of people is representative of the normal pop- 
ulation demographics (including children and people with mobility impairments), on 
offshore working vessels, the population sample corresponds to personnel specifically 
trained to work in offshore conditions (the level of fitness, familiarity with the layout, 
emergency preparedness and competence is significantly higher than that of the typi- 
cal passengers population). 


Crew Emergency Tasks. As indicated in Table 1, in most situations, crew are ex- 
pected to undertake active damage control and assist passengers during the muster 
and ship abandonment process. Crew emergency tasks involve directing passengers to 
the correct muster point or to alternative routes if the primary escapes are impaired 
and reduce the awareness time (active search of people in cabins), among others. This 
requires active internal communication among crew and between crew and passen- 
gers, which in essence amounts to giving and updating the objectives of individual 
evacuees. 


2.5 External Factors 


Sea State. The direct impact of wind and waves is on the ship behaviour, which in 
turn, translates into ship motions. Ship motions-induced accelerations may affect the 
walking speed of evacuees and even their decision making. 
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Time of Day. In passenger vessels, the time of day determines the initial location of 
persons on-board at the moment of an incident. During the night, persons are more 
likely to be located in cabins and asleep, which decreases their awareness and in- 
creases reaction time. During the day, the range of activities on-board and the location 
of the spaces will determine the choice of muster areas (usually the nearest possible) 
are the routes they would eventually take to reach the muster points (usually the most 
familiar). In working ships, the impact of the time of day is lower as these ships 
usually work in shifts i.e. they have the same persons load during the day and at night. 


3 Evacuation Simulation 


The software EVI, in its current form, was conceived in 2001 [2]. The first concept of 
the simulation tool was first presented in 2001 [3]. Since then, the code has undergone 
further development driven mainly by commercial applications. The key design prin- 
ciples and assumptions are outlined below. 


3.1. Multi-agent Simulation 


The EVI simulation is an implementation of multi-agent modelling, which is a further 
generalisation of process-based modelling methods where the environment is very 
well defined and the agents may communicate in a fairly versatile manner. In natural 
systems, all component parts "live" in some sort of topological space (predators and 
prey may live on a two dimensional forest floor, data packages traverse a network 
graph and the evacuees move around on a 2D deck). An environment is defined to be 
an artificial representation of this space. Autonomous agents can perform the activi- 
ties defined by a computer program in this environment. This strong sense of envi- 
ronment does not exist in a process-based simulation. Processes are only aware of 
themselves and the resources they wish to acquire. Communication in multi-agent 
simulation describes all interaction between real life entities. This makes multi-agent 
simulation an extremely powerful tool but also one, which is hard to verify in the 
context of known mathematical theory. The essence of using agents requires a rigor- 
ous definition and full implementation of the environment and its interfaces with the 
agents as well as an inter-agent communication protocol. 


3.2. The Environment 


Definition of the environment is one of the most important aspects of multi-agent 
modelling. This consists of three aspects: (i) geometry, (ii) topology and (iii) domain 
semantics. The whole ship layout is segmented into Euclidean convex regions with a 
structure of a linear space, directly connected if they have a common gate. This con- 
nectivity topology, for all computation and analysis purposes can be represented by a 
graph. 

In ship layout terms, regions correspond to spaces and gates correspond to doors. 
Regions can be defined as rectangular or convex polygons with attributes that control 
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initial conditions and semantic information that agents may query when traversing 
through (such as initial number of persons, fire zone, destination, etc.). Regions can 
be located at different level entities, called decks, defined by the height above a refer- 
ence level or baseline. The problem of finding the path of an agent to a muster point 
becomes reduced to searching the topology graph. 


3.3. The Agents 


The lowest common denominator of the many definitions of "agent" is an encapsula- 
tion of code and data, which has its own thread of control and is capable of executing 
independently the appropriate piece of code depending on its own state (the encapsu- 
lated data), the observables (the environment) and the stimuli (messages from other 
parts of the system or interactively provided). The agent-action model is essentially a 
‘sense-decide-act' loop. The sense and decide steps may be coalesced, as the sensing 
is nothing more than the interface of the agent with the data structures representing 
the environment. The decision process requires access to the perceived information, 
thus perception is not a complex process but rather a simple access interface between 
the environment and the agents. Notably, the actions of agents may also change the 
environment, giving rise to what is called interactive fiction. To address the modelling 
of human behaviour at the microscopic and macroscopic level, the agent model itself 
can be seen as being composed of a number of levels, see Fig. 1. 


Model Concept Model Implementation Stimuli 
——__$—$——>—>\ 


Task or Desire Objectives < Procedural Stimuli 


Path-Planning 


y 
Local Environment Waypoints 
Knowledge <4 ie 


Macroscopic Behaviour 


Local Environment Environment / Inter-Agent —————— 
i ‘ Obstacle Stimuli 
Stimulus Calculations SS 


Za 
Location Position Updating 2 Emiroomerta 
So ti 


Microscopic Behaviour 


Fig. 1. The agent model in EVI 


At the highest level, an Objective defines agent task or desire, for example, go to a 
cabin and wait for 60 seconds, search all the cabins on deck 7, fire zone 3, port side or 
evacuate to the nearest assembly station. In order to fulfil this desire, the Objective 
requests a path plan (routing) to be calculated, which defines what door and the order 
of the doors the agent should go through to advance from the current location to the 
destination. Once this data structure is in place, the agent will select a waypoint, an 
intermediate location to travel to, usually in direct line-of-sight from the agent 
(i.e. within a convex region), from the first door in the path plan route. With a defined 
direction to travel to, defined by the waypoint, the agent will move towards that 
location using position updating. In doing so, the agent will avoid the boundaries of 
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spaces and other agents in the locality by taking account of environment and inter- 
agent conflicts. 


3.4 Mesoscopic Modelling 


Ship arrangements are large with many routes from one location to another and end- 
less choices along the way. As a person traverses a route he/she will have to interact 
with other people along the route and react to the surrounding environment. This 
gives rise to a need to have two main methods of considering the problem: (i) Ma- 
croscopic modelling: addressing the problem of how passengers may find their way 
from one part of the ship environment to another (high-level planning), and (ii) Mi- 
croscopic modelling: considering how individuals interact with the environment with- 
in close proximity (low-level planning). 


Microscopic Behaviour. The microscopic model covers the behaviour of movement 
of agents within spaces. It dictates the way agents avoid boundaries of spaces and 
how it should avoid other agents. Given these constraints, the objective is to steer the 
agent towards a local destination (waypoint) in an optimal manner without being un- 
cooperative towards the other agents in the space. 


Environment discretisation and the agents. Given that the environment is discretised 
into convex regions, the process of moving from one door (gate) to another becomes a 
process of pursuit of a static target. However, with additional complexities such as 
other agents and obstacles, the process of steering becomes significantly more com- 
plex. The decision of how to approach this specific problem is one that determines the 
entire design of the simulation architecture. In this respect, two general approaches 
can be identified: (i) grid-based techniques and (ii) social forces models. Both ap- 
proaches have their merits and constrains. However, EVI combines the effectiveness 
of grid-based technique with the flexibility of social force methods, see Fig 2. 


Grid-based Hybrid (EVD Social forces 


Fig. 2. Agents in the environment 


In order to simplify calculation, a range of discrete decisions are established 
around the agent with the objective of identifying the one which will allow the agent 
to travel the greatest distance toward the local target. In addition, a continuous local 
(social/personal) space is established around each agent, which other agents will 
aim to avoid. This space is used to prevent a deadlock situation when the number 
of agents in an area becomes high. The agent makes a decision of the best use of its 
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personal space to resolve any conflicts that may arise. As a result, this approach al- 
lows the evacuation process to be modelled in sufficient detail and still run in real 
time or faster. In order to move, each agent needs to be aware of the local surrounding 
environment and draw conclusions on how to move. This update procedure is defined 
in terms of three steps: perception, decision and action. 


Perception. Agents use their update vector to check their personal space for bounda- 
ries (containment) and other agents (collision avoidance and lane formation). This 
takes place in the form of discrete directions. The magnitude of the vector corres- 
ponds to the distance that can be travelled over the time step for a given nominal 
walking speed. 
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Fig. 3. Agent microscopic behaviour 


Decision. A rational rule-based process is used to select the action to take for the cur- 
rent time step. The decision process makes use of information on the previous time 
step combined with information acquired from the Perception algorithm. The algo- 
rithm also gathers state information from the current environment and considers a 
number of discrete possibilities for updating the agent status: 


e Update: The agent should update as normal moving as far along the update vector 
as possible. 

e Wait: The agent does not move. 

e Swap with Agent: The agent in collaboration with another on-coming agent has 
decided to swap positions to resolve deadlock. 

e Squeeze through: The agent is congested but perception has indicated that if the 
agent disregards its personal space it can progress 

e Step back: An agent who is squeezing through has violated the personal space of 


another agent. The direction of update is reversed to allow the squeezing agent 
through. 


Action. This consists of careful updating of the status of all agents based on updating 
the decisions made. Due to the nature of software programming, this is, of necessity, a 
sequential activity to avoid loss of synchronisation. To ensure that agents update 
properly, order is introduced into the system whereby each agent requests those in 
front, travelling in the same direction, to update first before updating itself. 
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Macroscopic Behaviour. The macroscopic behaviour defines the way an agent will 
travel from one location to another on board the ship layout. Building on the graph 
structure defined within the model, the process of identifying the shortest route to a 
destination is achieved using Dijkstra's classic shortest path algorithm with the 
weighting taken as the distance between doors. This concept is very similar to the 
Potential methods used in other evacuation simulation models except that distance is 
only considered along the links of the graph rather than throughout space. Once route 
information has been generated for each node, the process of travelling from one 
point in the environment to another is just a case of following the sequence of infor- 
mation laid down by the search; this is referred to as the path plan. 

Path-plan information is generated on demand when required by agents, and except 
for cases where the path plan refers to an assembly station, route information is de- 
leted when no longer required. To ensure that the path-planner will respect the sig- 
nage within the ship arrangement regions and doors attributes include definitions of 
primary exits and primary routes, which can force agents to use specific routes. 


3.5 Modelling Uncertainty 


The psychological and physiological attributes of humans are non-deterministic quan- 
tities. Even in a contrived experiment one can hardly reproduce human ac- 
tions/reactions even if all of the conditions remained the same. This inherent unpre- 
dictability of human behaviour, especially under unusual and stressful circumstances, 
requires that human behaviour be modelled with some built-in uncertainty. 


Demographics. All parameters related to human decision or action, are modelled as 
random variables with user-defined probability distributions. This information, re- 
ferred to as demographics includes variables such as awareness/response time, gender 
and walking speed, among others, is almost exclusively collected through observa- 
tional research using experiments that measure the response of people in controlled 
and uncontrolled environments. Typical demographic information is available from 
full scale trials in the form of basic statistics; see for example [1] and [5]. This infor- 
mation in conjunction with the probabilistic assumptions is used to carry out Monte- 
Carlo sampling to derive the values of response time and walking speed for each 
agent taking part in the simulation. 


EVacuability Index (EVI). For the purpose of undertaking evacuation analysis, a 
number of performance measures can be evaluated, such as time for a group of per- 
sons to clear a particular area (ESCAPE), time for all agents to complete assembly 
after a signal (MUSTER), time for a group or agents to complete escape, muster and 
ship abandon if these were carried out in sequence (EVACUATION). The choice of 
performance measure will depend on the specific scenario being evaluated. 
Considering the above, the term Evacuability is defined as the probability of the 
given objective (Escape, Muster, Evacuation, etc.) being achieved within a time t 
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from the moment the corresponding signal is given, for a given state of the ship envi- 
ronment (env) and for a given state of initial distribution (dist) of people in the envi- 
ronment. Thus, results from a number of simulation runs (given that the environment 
and the distribution remain the same) as a multi-set {tl, t2, t3, t4,... , tn} then by the 
law of large numbers Evacuability may be determined with an accuracy directly de- 
pendent on the number of runs. For practical applications, at least 50 individual simu- 
lations of the same evacuation scenario are required, and from these results, the 95 
percentile values are used for verification in accordance with IMO guidelines [1]. 


3.6 Scenario Modelling 


Based on the general aspects presented in Section 2, escape and evacuation scenarios 
may range from local escape from an individual zone of the ship (e.g. due to fire) to a 
complete ship evacuation (muster and abandon, e.g. due to a flooding incident). 

The impact of hazards associated with flooding and fire can be incorporated in EVI in 
time and space. The software is capable of reading time histories of ship motions and 
flood water in the ship compartmentation from time-domain flooding simulation tools 
such as PROTEUS-3.1 [7]. The impact of ship motions and floodwater on the agents 
is modelled by applying walking speed reduction coefficients that are functions of the 
inclination of the escape routes due to heel and/or trim of the ship, generated by the 
damage [5] [6]. The impact on the environment is modelled by way of treating re- 
gions directly affected by floodwater as inaccessible. 

In terms of fire hazards, the software is capable of importing fire hazards informa- 
tion from fire analysis tools such as FDS [8]. Fire hazards are described in the form of 
parameters such as temperature, heat fluxes, concentrations of toxic gases (such as 
CO, CO) and oxygen, smoke density, visibility, etc. The impact of these hazards on 
the agents is modelled by comparing against human tolerability criteria [6]. 


4 Conclusions 


This paper presents a high level description of the concept and implementation of the 
multi-agent simulation tool EVI — a pedestrian dynamics simulation environment 
developed with the aim of undertaking escape and evacuation analysis of passenger 
vessels in accordance with IMO guidelines [1]. 

Multi-agent simulations are computationally intensive; however for practical engi- 
neering applications, they have become viable with the advent of cheap and high 
computing power. 

The particular implementation of EVI combines a number of concepts and approaches 
which make EVI a versatile tool suitable for efficient and practical design verification. 

Due to the implicit level of uncertainty in the process, driven by human behaviour, 
verification of the tool has been successfully achieved in terms of component testing, 
functional and qualitative verification [4][5]. Data for quantitative verification is still 
lacking. 
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Over the past 5 years, EVI has evolved into a consequence analysis tool for design 
verification of passenger ships and SPS (offshore construction vessels, pipe-laying, 
large crane vessels) subject to design risk analysis. Among this type of applications, 
the following can be highlighted: 


e Verification of escape arrangements for alternative design & arrangements: this is 
part of the engineering analysis required in accordance with IMO MSC\Circ. 1002, 
see Fig. 4; 

e Escape, evacuation and rescue assessment for SPS (offshore construction vessels 
carrying more than 240 personnel onboard) — see Fig. 5. 

e Analysis of turnaround time in passenger ship terminals — see Fig. 6. 


Fig. 4. Verification of human tenability criteria for a layout fire zone 


Fig. 5. EVI model of a pipe-laying vessel (LQs with accommodation for 350 POB) for 
evacuation analysis 
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Fig. 6. EVI model of a Ro-Ro passenger ferry at the terminal for turnaround time analysis 
(2700 passengers disembarking) 
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Abstract. In this paper, the results of a user experience (UX) goal evaluation 
study are reported. The study was carried out as a part of a research and devel- 
opment project of a novel remote operator station (ROS) for container gantry 
crane operation in port yards. The objectives of the study were both to compare 
the UXs of two different user interface concepts and to give feedback on how 
well the UX goals experience of safe operation, sense of control, and feeling of 
presence are fulfilled with the developed ROS prototype. According to the re- 
sults, the experience of safe operation and feeling of presence were not sup- 
ported with the current version of the system. However, there was much better 
support for the fulfilment of the sense of control UX goal in the results. Metho- 
dologically, further work is needed in adapting the utilized Usability Case me- 
thod to suit UX goal evaluation better. 


Keywords: remote operation, user experience, user experience goal, evaluation. 


1 Introduction 


Setting user experience (UX) goals, which are sometimes also referred to as UX tar- 
gets, is a recently developed approach for designing products and services for certain 
kinds of experiences. While traditional usability goals focus on assessing how useful 
or productive a system is from product perspective, UX goals are concerned with how 
users experience a product from their own viewpoint [1]. Therefore, UX goals de- 
scribe what kind of positive experiences the product should evoke in the user [2]. 

In product development, UX goals define the experiential qualities to which the 
design process should aim at [2,3]. In our view, the goals should guide experience- 
driven product development [4] in its different phases. The goals should be defined in 
the early stages of design and the aim should be that in later product development 
phases the goals are considered when designing and implementing the solutions of the 
product. In addition, when evaluating the designed product with users, it should be 
assessed whether the originally defined UX goals are achieved with it. 
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In the evaluation of UX goals in the case study reported in this paper, we have uti- 
lized a case-based reasoning method called Usability Case (UC). For details about the 
UC method, see for example [5]. In order to test empirically how the method suits the 
evaluation of UX goals, we used it to conduct an evaluation of UX goals of a remote 
operator station (ROS) user interface (UI) for container crane operation. Next, the 
details of the evaluation study case and the utilized UC method are described. 


Z The Evaluation Study Case 


Our case study was carried out as a part of a research and development project of a 
novel ROS for container gantry crane operation in port yards. These kinds of remote 
operation systems exist already in some ports of the world and are used for example 
for the landside road truck loading zone operation of semi-automated stacking cranes. 

Both safety and UX aspects motivated the case study. Firstly, taking safety aspects 
into account is naturally important in traditional on-the-spot port crane operation as 
people’s lives can be in danger. However, it becomes even more important when op- 
erating the crane remotely, because the operator is not physically present in the opera- 
tion area and for example, visual, auditory, and haptic information from the object 
environment is mediated through a technical system. Secondly, although UX has tra- 
ditionally not been in the focus of complex work systems development, it has recently 
been discussed as a factor to be taken into account in this domain also (e.g., [6]). 

Hence, the aim of our project was to explore ways to enhance the UX of the remote 
crane operators by developing a novel ROS operation concept, which also takes into 
account the required safety aspects. To achieve this aim, we defined UX goals and 
user requirements based on an earlier field study by us. The field study (for details, 
see [7]) was conducted in two international ports and included operator interviews 
and field observations of their work. The UX goals were created in the beginning of 
the project and then utilized in guiding the design work throughout the development 
of the new ROS. In addition, altogether 72 user requirements (when counting both 
main and sub requirements) were defined and connected to the created UX goals. 

The overall UX theme for the new ROS was defined to be ‘hands-on remote opera- 
tion experience’. The four UX goals to realize this theme were chosen after a delibe- 
rate process to be “experience of safe operation’, ‘sense of control’, ‘feeling of pres- 
ence’, and ‘experience of fluent co-operation’. Details about how these goals were 
chosen and what they mean in practice regarding the developed system can be found 
in [2] and [3]. In the evaluation study of the ROS reported in this paper, the expe- 
rience of fluent co-operation goal could not be included as the functionalities support- 
ing co-operation between different actors in operations were not yet implemented to 
the ROS prototype and the participants conducted the operations individually. 

The main objectives of the conducted evaluations were twofold. Firstly, we wanted 
to compare the user experience of two optional ROS user interface concepts, which 
were developed during the project. Secondly, we strived to receive data from the 
evaluations on how well the UX goals experience of safe operation, sense of control, 
and feeling of presence are fulfilled with the current ROS prototype system. 
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2.1. The Study Setting 


The evaluations were conducted with a simulator version of the ROS system, which 
was operated with two industrial joysticks and a tablet computer (see Fig. | for a con- 
cept illustration). A 32-inch display placed on the operator’s desk provided the main 
operating view, which included virtual reality (VR) camera views and simulated, but 
realistic operational data (e.g., parameters related to the weight of a container). 


Fig. 1. Concept illustration of the ROS system with the four-view setup in the main display 


The main display’s user interface consisted of camera views and operational data 
provided by the system. In this display, two different user interface setups were im- 
plemented to the virtual prototype: a four-view (see Fig. 1 for a simplified concept 
illustration version) and a two-view setup. Wireframe versions of the layouts of these 
two user interface setups for the main operating display can be seen in Fig. 2. 


Four-view setup Two-view setup 


Frontside Overview Backside 
lane camera camera lane camera 
views view views 


Spreader Overview 
camera camera 
views view 


Spreader 
Operational camera Operational 


data views data Operational data 


Spreader lock status information Spreader lock status information 


Fig. 2. Wireframe versions of the two alternative main display setups of the concepts 
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Operation Tasks in Remote Container Crane Operation. Semi-automated gantry 
cranes in ports are operated manually for example when lifting or lowering containers 
from and to road trucks, which are visiting the port. These operations happen physi- 
cally in a specific area called the loading zone. The cranes are operated manually 
from an ROS after the spreader (device in the cranes used for lifting and lowering the 
containers) reaches a certain height in the loading zone during the otherwise auto- 
mated operation. The remote operator utilizes real-time data and loading zone cam- 
eras to ensure that the operation goes safely and smoothly. 


User Interface of the Four-view Setup. The user interface of the four-view setup 
(Fig. 1) included four distinct camera views: 1) overview camera view (top-middle), 
2) spreader camera view (bottom-middle) that combined pictures of the four cameras 
attached to the corners of the spreader, 3) frontside lane camera views (top-left), and 
4) backside lane camera views (top-right). Both of the lane camera views combined 
two video feeds from the corners of the truck into one unified view. Three separate 
camera views could be changed to the overview camera view: an area view (seen in 
the top-middle view of Fig. 1), a trolley view (a camera shooting downwards from the 
trolley), and a booth view (a camera showing the truck driver’s booth in the loading 
zone). On the left and right side of the spreader camera view, different types of opera- 
tional data were displayed. 


User Interface of the Two-View Setup. The user interface of the two-view setup 
(see Fig. 2) consisted of only two, but larger camera views than in the four-view se- 
tup: the spreader camera view on the top-left side and the overview camera view on 
the top-right side. Both of these views could be easily changed to show the relevant 
camera view at each phase of the task. To the left-side view, also the lane camera 
views could be chosen. To the right-side view, the aforementioned area, trolley and 
booth views could be chosen. Under the camera views, there were several crane pa- 
rameters and different status information displayed in a slightly different order than in 
the four-view setup. 


Control Devices of the Concepts. The joystick functions of the two- and the four- 
view concepts varied. In the joystick functions of the four-view concept, the left joys- 
tick’s functions were related to the overview camera (e.g., zoom, pan, and tilt) and for 
moving the trolley or the gantry. The right joystick was used for special spreader 
functions such as trim, skew, opening/closing the twist locks (that keep the container 
attached from its top corners to the spreader), and moving the spreader up- and 
downwards. 

In the two-view concept, the joystick functions were optimized for the operation of 
the two camera views: the left joystick had controls related to the spreader view (e.g., 
skew and moving the spreader) and the right joystick to the overview view (e.g., the 
aforementioned camera operations). 

On the tablet, located between the joysticks, there were functions for example 
for changing the different camera views: in the four-view concept there was only a 
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possibility to change the top-middle overview view while in the two-view concept it 
was possible to change both the left and right side camera views. In addition, the re- 
ceived task could be canceled during operation or finalized after operation from the 
tablet. 


2.2‘ Participants 


In total, six work-domain experts were recruited as participants for the evaluation 
study. Three of them had previous experience in remote crane operation. All subjects 
were familiar with the operation of different traditional container cranes: two of them 
had over ten years of experience of operating different types of industrial cranes, three 
of them had 1-5 years of experience, and one of them had 6-10 years of experience. 


2.3. Test Methods 


In order to evaluate how the originally defined UX goals and user requirements are 
fulfilled with the evaluated prototype, we used a combination of different methods. 
During a one evaluation session, the participant was first interviewed about his expe- 
rience and opinions regarding crane operation. Then, the participant was introduced to 
the developed prototype system and asked to conduct different operational tasks with 
the two alternative concepts of the system. 

The test tasks included container lifting and landing operations to and from road 
trucks in varying simulated conditions. The first task was for training purposes and 
included a very basic pick-up operation; its aim was to learn to use the controls and 
the simulator after a short introduction to them. To support the joystick operation, the 
participants received a piece of paper describing the function layouts of the joysticks. 

The other operation tasks were more challenging than the first one, and included 
different disruptive factors, such as for example strong wind, nearly similarly colored 
container chassis as the container to be landed, other containers in the surrounding 
lanes, a truck driver walking in the loading zone, and a locked chassis pin. These 
tasks were conducted with both of the concepts, but not in the same order. 

The two different concepts (the four- and the two-view concepts) were tested one 
at a time. The order of starting with the two-view or with the four-view concept was 
counterbalanced. Therefore, every other user started first with the two-view concept 
and every other with the four-view concept. 

A short semi-structured interview was conducted after each operational task. In ad- 
dition, two separate questionnaires were used to gather information: the first one 
about the user experience and the second one about the systems usability [8] of the 
concepts. The UX questionnaire consisted of twelve user experience statements that 
were scaled with a 5-point Likert scale. The UX questionnaire was filled in when the 
test participants had completed all the tasks with either of the concepts. Ultimately, 
the UX questionnaire was filled in regarding both of the concepts. 

In the end of the test session, some general questions related to the concepts were 
asked before the participants were requested to select the concept that they preferred 
and that in their opinion had a better user experience. Finally, a customized systems 
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usability (see e.g., [8]) questionnaire was filled in for the selected concept. The sys- 
tems usability questionnaire included thirty-one statements that were also scaled with 
a five-point Likert scale. Due to space restrictions, neither of the abovementioned 
questionnaires is presented in detail in this paper. 

The test leader asked the participants to think-aloud [9], if possible, while execut- 
ing the operation tasks. The think-aloud protocol was utilized to make it easier for the 
researchers to understand how the participants actually experience the developed 
concept solutions. The evaluation sessions were video recorded to aid data analysis. 


2.4 Analysis 


The ultimate aim of the evaluations was to assess whether the chosen UX goals were 
fulfilled with the VR prototype version of the system. To do this, we utilized the Usa- 
bility Case method, because we wanted to explore the suitability of the method for 
this kind of research. UC provides a systematic reasoning tool and reference for ga- 
thering data of the technology under design and for testing its usability in the targeted 
work [10]. The method applies a case-based reasoning approach, similar to the Safety 
Case method [11]. Throughout the development process, the UC method creates an 
accumulated and documented body of evidence that provides convincing and valid 
arguments of the degree of usability of a system for a given application in a given 
environment [5]. The main elements of UC are: 1) claim(s) (nine main claims of sys- 
tems usability [8], of which three are related particularly to UX) that describe an 
attribute of the system in terms of usability (e.g., “User interface X is appropriate for 
task Y”’), 2) subclaim(s) describing a subattribute of the system that contributes to the 
main claim (e.g., “X should work efficiently), 3) argument(s) that provides ground for 
analyzing the (sub)claims (e.g., “It is possible to quickly reach the desired result with 
X”), and 4) evidence, which is the data that provides either positive or negative proof 
for the argument(s) (e.g., task completion times in usability tests) [5]. 

In line with the UC method, the data gathered from our studies was carefully ana- 
lyzed regarding each defined user requirement (i.e., a subclaim in UC) on whether 
positive or negative cumulative evidence was found about the fulfillment of each 
requirement. This fulfilment was based on the arguments derived from the evidence. 
On the basis of the fulfilment of different user requirements, it was possible to deter- 
mine whether a certain UX goal (1.e., a claim in UC) is fulfilled or not. If most of the 
user requirements connected to a certain goal were met, then also the UX goal could 
be said to have been fulfilled. In addition to this kind of evidence-based reasoning, the 
UC method also provided us with data on the usability and UX of the concepts under 
evaluation. These results support the design work by providing feedback for future 
development. 


3 Results 


The results of our studies are presented in the following order: First, we present 
general user experience and usability related results that affected the chosen UX goals 
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regarding both the four- and the two-view concepts. Then, we discuss which of the 
concepts the participants chose in the end of the test sessions and why. Finally, we 
discuss whether the defined UX goals were fulfilled and make hypotheses on what 
were the underlying reasons for these results. 


3.1. Notes on General UX and Usability of the Concepts 


Four-view Concept. In general, the participants felt that the information provided by 
the main display’s four-view setup was appropriate and understandable: for example, 
the participants commented that the amount of presented camera views at once was 
suitable and most of the necessary information was available for the basic crane oper- 
ations. However, some of the participants felt that for example information about 
possible fault conditions concerning the crane were missing from the current solution. 

While performing the test tasks, the participants utilized most frequently the area 
and the spreader camera views. The spreader camera view was experienced to be 
useful especially at the beginning of a lifting task. However, when the spreader ap- 
proached the container, it became more difficult to understand the position of the 
spreader in relation to the container in detail. In addition, the participants thought that 
the provided lane camera views did not support the beginning phase of the container 
pick-up operations, because the participants could not clearly comprehend the orienta- 
tion of the provided views until the spreader was seen moving in the views. 

Regarding the joystick functions in the four-view concept, the placement of some 
functions was not reported to support the operations very well. For example, the posi- 
tions of the skew and trim functions were not optimal, since participants made fre- 
quent mistakes with them and reported to get emotionally frustrated with them. In 
addition, the position of the zoom was proposed to be placed together with the steer- 
ing functions, i.e., to be designed into the right-hand joystick. 

The overall nature of the results of the UX questionnaire statements related to 
sense of control with the four-view concept was positive. The participants felt that 
they were able to start, conduct, and stop the operations at their own pace. In addition, 
according to the interviews, the provided joysticks were experienced to be suitable for 
the remote operation of cranes and the feel of the joysticks to be robust enough. Also, 
the crane’s reactions to the joystick movements was experienced to be appropriate. 

Nevertheless, the UX goal feeling of presence did not get as much supportive re- 
sults as sense of control. This was mostly due to the problems identified with the solu- 
tions aimed to fulfil requirements concerning the operation view. For example, the 
four-view concept’s camera views were experienced to be too small for the partici- 
pants to easily see everything that was necessary. In addition, combining two camera 
views together (in the lane cameras) received negative evidence; the participants had 
difficulties to orientate themselves with the combined camera views and perceive to 
which direction each of the cameras was shooting at. 

The experience of safe operation with the four-view setup was reported to be nega- 
tively affected by the presentation layout of the operational parameters. For example, 
the grouping of the information was not experienced to be in line with a typical task 
flow of one operation. 
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Two-view Concept. The two-view setup in the main display was generally expe- 
rienced to be clearer than the four-view concept according to the participants’ think- 
ing-out-loud comments and interviews. For example, the camera views were found to 
be big enough to spot relevant things from the object environment. Especially the area 
view was utilized a lot during the operations, because it offered a possibility to see 
better the spreader in relation to the container. 

With the two-view concept the users felt that all the needed operational informa- 
tion was available and in a logical order (i.e., in line with a typical task flow of one 
operation). The participants for example mentioned that it was possible to perceive 
easily the status of the operation with one glance from this information. 

The UX questionnaire results concerning statements related to sense of control 
with the two-view concept were positive, mostly due to the same reasons as they were 
with the four-view concept. In addition, these results showed that the participants felt 
that they were able to concentrate on a sufficient level on performing their operations 
with the two-view concept. 

However, the UX goal feeling of presence received somewhat negative results 
from the tests. For example, the participants had difficulties to perceive the operation 
view provided through the different combined camera views. As with the four-view 
setup, especially the views of the combined camera views of spreader and lane cam- 
eras were experienced to be hard to understand what is seen from them. In addition, 
the camera views were not reported to support the comprehension of depth and differ- 
ent distances between objects in the loading zone very well. 

Furthermore, the results regarding requirements connected to the provided camera 
views were fairly negative. Some of the participants commented that due to the 
placement of the camera views they were not able to see critical objects related to the 
task at hand through the camera views in the outmost truck lanes; for example, it was 
not possible to see easily all corners of the container and the truck’s position. These 
results had a significant effect to the experience of safe operation UX goal. 


3.2 Concept Selection 


When asked at the end of the test session that which of the two concepts the partici- 
pant preferred, four of the participants selected the two-view concept and two of them 
chose the four-view concept. Based on the participants’ experience, the two-view 
concept was easier to understand: it was reported to be effortless to observe the load- 
ing zone through the big camera views and the provided operational information was 
said to be placed in a logical order. However, according to the participants, some of 
the joystick functionalities were placed better in the four-view concept than in the 
two-view concept. 

In general, it can also be said that the results of the systems usability questionnaire 
were fairly positive regarding the both concepts. These results were further utilized in 
the analysis of fulfillment of the defined user requirements and UX goals described in 
the next section. 
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3.3. Fulfilment of User Requirements and UX Goals 


Most of the user requirements were not fulfilled on a comprehensive level with nei- 
ther the four- nor the two-view setups of the current prototype system. Especially the 
evidence related to the user requirements that were connected to the UX goals expe- 
rience of safe operation and feeling of presence was mostly negative. Therefore, it can 
be said that these two goals were not fulfilled with the current versions of the ROS’s 
two- and four-view concepts. 

The experience of safe operation was affected for example by the fact that the par- 
ticipants were not able to form a clear picture of the situation in the loading zone 
when handling the container in the outmost truck lanes. Therefore, they needed to 
manually adjust the cameras a lot in order to gain a better view to the position of the 
truck and corners of the container. In addition to the aforementioned factors, the over- 
view camera was not experienced to be sharp enough (when zoomed in) for the par- 
ticipants to be able to see whether the truck’s chassis’ pins are locked or unlocked 
when starting a lifting operation. An obvious danger to safety from this problem is 
that if the pins are locked when starting a container lifting operation, also the truck 
will be lifted to the air with the container. 

The feeling of presence UX goal was negatively affected for example by the fact 
that some of the camera views (e.g., lane cameras) were difficult for the participants 
to understand and orientate themselves into. Furthermore, understanding distances 
between different objects in the loading zone was not experienced to be sufficient 
with the current camera views. In addition, some of the default zooming levels of the 
cameras were not very optimal for the conducted task in question and the participants 
had to do a lot of manual zooming. In Fig. 3, we provide an example of the used Usa- 
bility Case-based reasoning regarding negative evidence for one requirement con- 
nected to the UX goal feeling of presence. 


Claim Subclaim Argument Evidence 


Since half of the 
users reported to have 
difficulties in receiving 
the general picture of 
the loading zone through 
the camera views, this 
result provides negative 
evidence for the 
fulfilment of user 
requirement 5.3. 


User requirement 5.3: 
“The camera views 
should provide a unified 
view of the loading zone 
so that it can be easily 
understood” 


Three of the users felt that they 


UX goal: 

were confused with what each 

camera view is aiming at in the 
loading zone. 


Feeling of 
presence 


Fig. 3. Example of Usability Case based reasoning in our analysis. 


The example of evidence in Fig. 3 was negative comments from three different 
participants while conducting the tasks with the ROS. In general, other than verbal 
evidence (the thinking-out-loud comments or the interview answers) provided by the 
participants were for example the results of the (UX and systems usability) question- 
naires and task performance indicators. All this data was considered when creating the 
final Usability Case, which cannot be described here entirely due to its large size. 

Regarding the sense of control UX goal, there was clear positive evidence in the 
end results from both of the concepts. For example, the utilized joysticks were felt to 
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be robust enough and to control the crane with an appropriate feel of operation. Over- 
all, the participants felt that they were able to master the crane’s operations and con- 
centrate on the task at hand. In addition, the possibilities to freely decide when to start 
and stop operating and to easily adjust the speed of operation with the joysticks were 
felt to be positive features. Therefore, it can be said that sense of control was achieved 
with both of the evaluated concepts. 


4 Discussion 


The results indicate that the evaluated concepts had both positive and negative as- 
pects. The design of the final concept solution should be based on the positive aspects 
taken from both of the evaluated concepts. From the two-view concept, especially the 
placement of the operational data and size of the camera views should be adopted to 
the final concept. From the four-view concept, for example the layout of the joystick 
functions regarding the basic crane movements should be utilized. 

In general, the results confirmed that providing real-time camera feeds for this kind 
of remote operation is essential. Visual validation of the situation in the object envi- 
ronment allows taking into consideration possible extra variables affecting the opera- 
tion, such as weather conditions or debris on top of the container to be lifted up. 
Therefore, good quality camera views could support the experience of safe operation 
and feeling of presence goals with the final system. 

The ecological validity of the prototype system also needs to be discussed as it may 
have had an effect to the UX goals. First, the fact that the operations with the system 
were not happening in reality, had an obvious effect on the participants’ user expe- 
rience and attitude towards the operations; if for example the people seen in the object 
environment would have been real human beings instead of virtual ones, the partici- 
pants could have been more cautious with the operations. This fact had an obvious 
effect especially to the experience of safe operation UX goal. 

Second, the virtual camera views cannot of course correspond to real camera views 
from the object environment. This had an obvious effect on the feeling of presence 
UX goal. However, it must be noted that some of the test participants thought that the 
virtual simulator was near equal to a real remote crane operation system, since the 
provided virtual camera views were implemented with such a good resolution. The 
simulator was also reported to provide a relatively precise feel of the operation, but 
did not for example have as much swaying of the container as it would have in real 
operations. 

Third, the fact that in real life there are truck drivers with whom the operators 
communicate through the phone in case of problems affected the ecological validity 
of the conducted tasks. In addition, the participants conducted the tasks individually 
in a small room, which is not the case in real remote crane operation work. Therefore, 
as in real conditions the work is actually much more social than in our evaluation 
study, this had an obvious effect on the validity of the results of the studies. 
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5 Conclusions 


The conducted study did not give an exact answer to the question, which one of the 
concepts should be selected for future development. Both concepts had positive fac- 
tors that should be taken into account when designing the final system. 

Different camera views provided essential information from the operating area. A 
decision concerning the amount of cameras in the loading zone and the camera views 
provided in the ROS needs to be made for the final concept to support safe crane op- 
eration. Another important factor is the size of the camera views in the main display. 
The two-view setup was experienced to have large enough views for the operation. A 
balance between the amount and size of the views presented in the user interface 
needs to be found. If the display space of a one monitor does not allow to present big 
enough camera views, then the possibility of two monitors needs to be considered. 

To some extent, it was possible to evaluate the user experience of remotely operat- 
ed crane operations with our virtual simulator even though the camera views were not 
real. However, the user experience of the system was not the same as if it was when 
operating in a real work environment. For example, the sounds, tones, or noises from 
the operating environment were not in the focus of the concept development or this 
evaluation study. In the final system’s development, careful attention should be paid 
to the auditory information provided by the system from the object environment. 

In general, as most of the user requirements related to the UX goals feeling of 
presence and experience of safe operation were not supported by the evidence from 
the evaluation studies, it can also be said that the originally defined main UX theme 
of ‘hands-on remote operation experience’ was not yet fulfilled with the current pro- 
totype system. In the future development, the requirements that were not met should 
be taken under careful investigation and answered with sufficient solutions. In this 
way, also the defined UX goals could be met better with the final system. 

Nevertheless, the evidence from our study results supported the fulfillment of the 
UX goal sense of control for both of the concepts. Especially the feeling of the joys- 
tick operation and reactions of the crane were experienced to be appropriate and rea- 
listic. Support for aiming the spreader and the container to the correct position could 
enhance the sense of control even more in the future versions of the UI. 

In the future development of the ROS, special attention should also be paid to the 
experience of fluent co-operation UX goal and different aspects related to it (e.g., the 
interaction between the co-workers and the truck drivers) as in the present study it 
was not possible to address this goal appropriately. Therefore, future studies with the 
system should include for example several test participants operating simultaneously 
with the system in order for the operational setting to be more realistic. To increase 
the ecological validity of the results, a more comprehensive study with a wider range 
of data inquiry methods could be carried out in a real control room setting with actual 
operators. This kind of a study could be conducted by adding some features of the 
proposed concept to the current, already implemented ROS solutions at some port and 
then evaluating whether the new features are useful and make the work more pleasant. 

Methodologically, this paper has contributed to the discussion on how UX goals 
can be evaluated. According to the results, although the evaluated concepts were still 
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in quite early stages of their design, the Usability Case method seemed to suit to this 
kind of UX goal evaluation with some modifications. Firstly, further work is needed 
especially on linking the arguments regarding the user requirements to the detailed 
design implications (for details see e.g., [3]) of the UX goals. Secondly, a scoring 
method for the evidence provided by study data should be implemented to the UC 
method in general, so that more emphasis could be placed on the data concerning the 
most critical parts of the evaluated product. Finally, it should be experimented wheth- 
er other than the utilized data gathering methods could provide relevant data in con- 
structing the Usability Case and studied how the method supports also later phases 
(than just the early-stage evaluation) of UX goal driven product development. 
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Abstract. This paper describes ongoing efforts to address the challenges of su- 
pervising teams of heterogeneous unmanned vehicles through the use of dem- 
onstrated Ecological Interface Design (EID) principles. We first review the EID 
framework and discuss how we have applied it to the unmanned systems 
domain. Then, drawing from specific interface examples, we present several 
generalizable design strategies for improved supervisory control displays. We 
discuss how ecological display techniques can be used to increase the transpa- 
rency and observability of highly automated unmanned systems by enabling 
operators to efficiently perceive and reason about automated support outcomes 
and purposefully direct system behavior. 


Keywords: Ecological Interface Design (EID), automation transparency, un- 
manned systems, supervisory control, displays. 


1 Introduction 


Unmanned systems play a critical and growing role in the maritime domain, with 
coordinated air and water vehicle teams supporting an increasing range of complex 
operations, ranging from military missions to disaster response and recovery. Tradi- 
tionally, unmanned vehicle operators have served as teleoperators, monitoring video 
or other sensor feeds and controlling vehicle behaviors through continuous “stick- 
and-rudder’’-type piloting commands. However, significant advances in platform and 
sensor automation (e.g., flight control systems; onboard navigation; hazard detection; 
wayfinding capabilities) have increasingly offloaded these lower-level control tasks. 
This has allowed operators to instead focus on higher-order supervisory control activi- 
ties, paving the way for a single operator or small team of operators to simultaneously 
manage multiple vehicles. 

Despite advances in autonomy, unmanned system operators are still faced with 
significant challenges. As in other domains where operators supervise highly complex 
and automated systems (e.g., nuclear power, air traffic control), the introduction of 
support automation does not allow operators to simply shed control tasks and their 
associated workload. Rather, this automation shifts the emphasis of operator tasks 
from continuous display tracking and physical control inputs to activities that focus 
on system monitoring and understanding, coordination, and troubleshooting. In the 
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case of managing autonomous vehicle teams, this supervisory role involves high cog- 
nitive workload, both in monitoring system performance and in supporting frequent 
re-planning and re-tasking in response to evolving mission needs and changes to the 
operational environment. These activities place significant demands on operators’ 
taxed attentional resources and require operators to maintain detailed situation aware- 
ness to successfully detect and appropriately respond to changing conditions. Work- 
load and potential for error is further increased when “strong and silent” automation 
support is not designed from inception to be observable by human users, making it 
difficult for operators to understand, predict, or control automated system behavior 
[1]. This typically results in users turning off or disregarding automated support tools 
or, paradoxically, completely trusting and over-relying upon automation even when it 
is insufficient [2]. 

The inherent challenges of supervisory control are further exacerbated by the 
growing size and heterogeneity of the unmanned vehicle teams themselves [3,4]. 
While automation provides significant support, operators of mixed-vehicle teams 
must still carefully consider and reason about the consequences of individual vehicle 
capabilities and performance parameters (e.g., platform speed, agility, fuel consump- 
tion and range; available onboard sensor and automation systems), as well as safety- 
critical differences (e.g., a specific vehicle’s need to maintain a larger separation from 
other traffic due to its lack of onboard sense-and-avoid autonomy; the expected com- 
munication intermittencies and latencies for a long-duration underwater vehicle). 
Currently, much of these between-vehicle differences, and their associated mission 
impacts, are masked by opaque automation systems. This limits operators’ ability to 
reason about platform differences and predict how these will uniquely affect mission 
performance. When such information is made available through operator interfaces, it 
is typically buried within individual vehicle specifications, accessible only through 
serial, “drill-down” exploration methods. More critically, this vehicle-specific infor- 
mation is rarely related to higher-order mission goals, nor is it presented in way that 
enables operators to anticipate or understand the behaviors of lower-level system 
automation. In this paper, we describe ongoing efforts to address these challenges by 
applying demonstrated principles of Ecological Interface Design [5,6]. 


2 Background 


The effective supervision of complex and highly automated sociotechnical systems— 
of which unmanned vehicle teams are but one timely example—presents unique chal- 
lenges to human operators. In light of this, highly specialized interfaces are required 
that enable operators to both: (1) readily perceive and reason about the critical func- 
tional connections across myriad system components; and then (2) expertly identify 
and execute strategies purposefully driving system behaviors. These interfaces must 
serve, in effect, to increase the transparency of otherwise opaque system automation 
and processes, providing operators with intuitive mechanisms for high-level under- 
standing of, and interaction with, complex systems. Ecological Interface Design (EID) 
represents a promising and powerful approach to develop such interfaces. 
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The practice of EID stems from decades of applied research focused on under- 
standing how expert knowledge workers monitor, identify problems, and select and 
execute response strategies in complex systems. While early applications typically 
focused on physical process systems, such as nuclear power generation and petro- 
chemical refinement [9,10], the EID approach has been extended to settings as diverse 
as anesthesiology [11], military command and control [12], and the supervisory con- 
trol of unmanned vehicles and robot teams [13,14]. The EID approach derives its 
name and underlying philosophy from theories of ecological visual perception [15], 
which propose that organisms in the natural world are able to directly perceive oppor- 
tunities for action afforded by elements of their surrounding environment (“affor- 
dances”) without the need for higher-order cognitive processing. Unlike cognitive, 
inferential activities—which are slow and error-prone—control actions or responses 
based on direct visual perception are effortless and can be performed rapidly without 
significant cognitive overhead. 

EID techniques strive to capture similar intuitive affordances for control actions 
within highly automated and display-mediated systems, whose inner workings are 
otherwise fully removed and hidden from the operator. Within such complex technol- 
ogical systems, decision-critical attributes of the operational domain are typically 
described by abstract properties, such as procedural doctrine, physical laws, mathe- 
matical state equations, or meta-information attributes (e.g., uncertainty, pedigree, 
recency of available information), in addition to traditional data resources. In contrast 
to natural ecologies, these critical properties cannot be directly perceived and acted 
upon by human operators. For this reason, EID attempts to increase system transpa- 
rency and observability to “make visible the invisible” [5], using graphical figures to 
explicitly map such abstract properties—and their tightly coupled relationships across 
system components, processes, and operational goals—to readily perceived visual 
characteristics of interface display elements (e.g., the thickness, angular orientation, 
or color of a line; the size or transparency of an icon). 

Various tools and methodologies have been proposed to generate such visual map- 
pings from underlying analyses of the cognitive work domain [6,16,17] and interface 
designers may also able to incorporate or otherwise adapt a wide variety of demon- 
strated, reusable ecological interface display components [6]. Purposefully designed 
arrangements of these simple display elements can facilitate direct perception of sys- 
tem state and support the rapid invocation of operator’s highly automatic, skill- and 
rule-based control responses during normal operations. Also, because these graphics 
provide veridical, visual models of system dynamics across multiple levels of abstrac- 
tion, they provide a useful scaffold for supporting deep, knowledge-based reasoning 
over system behavior during novel situations or fault response [8,18]. 

Our own work builds upon and extends previous applications of EID to the un- 
manned systems domain, focusing specifically on the challenges of enabling operators 
to supervise teams of heterogeneous unmanned vehicles. In these situations, differ- 
ences in the operating characteristics of individual vehicles (e.g., platform capabilities 
and handling, available sensor systems, extent of onboard autonomy) can have a pro- 
found impact on how the operator must interpret system information and interact with 
individual team components. In the remainder of this paper, we describe our ongoing 
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applications of the EID approach to the unmanned systems domain and discuss sever- 
al exemplar design outcomes from this process. 


3 Approach 


The development of EID displays begins with a structured analysis of the work do- 
main the interfaces are intended to support. Although specific approaches differ 
across the practitioner community, these underlying work domain analyses typically 
involve the development of an abstraction hierarchy model (AH) [18], often as part of 
a broader Cognitive Work Analysis (CWA) effort [5]. The AH structure provides a 
scaffold for representing the physical and intentional constraints that define what 
work can be accomplished within a technical system. An AH model describes these 
constraints across multiple levels of aggregation (e.g., system, subsystem, component) 
and functional abstraction. Connections between elements and across levels of ab- 
straction in the model represent “means/ends” relationships, describing how individu- 
al, low-level system components relate to complex physical processes and the 
achievement of higher-order system goals. These maps closely correspond to the 
problem-solving strategies of system experts [18] and they are used to directly inform 
the underlying informational content and organizing structure of EID displays [6]. 

To ground our own design efforts, we have developed multiple models across the 
naval unmanned systems domain, including abstraction hierarchies that focus on 
teams of heterogeneous vehicles operating collaboratively within a single mission 
context. These models have explored a number of operational scenarios built upon 
emerging concepts of operations for collaborative vehicle teaming. As such, they 
feature a number of elements relevant to challenging supervisory control, including 
large numbers of mixed military and civilian vehicle types in a constrained physical 
space, manned/unmanned traffic mixing, and communication intermittency. In devel- 
oping our domain models, we have collaborated extensively with subject matter ex- 
perts, building upon an extensive foundation of prior knowledge elicitation efforts, 
cognitive task analyses, and simulation-based modeling efforts that our team has con- 
ducted within the heterogeneous unmanned systems domain (see [3,4]). Throughout 
these efforts, we have considered how the constraints imposed by complex, dynamic 
operational environments affect the ability of a team of vehicles with varying capabil- 
ities to support mission goals. We have also explored operators’ need to understand 
and purposefully direct automation, particularly when interacting with vehicle tasking 
and route planning tools in dynamic operating environments with significant and 
shifting operational hazards, including weather and traffic. 

Building upon these AH models, we have applied EID techniques to identify and 
explore methods to integrate displays of relevant system information (e.g., airspace, 
bathymetry, and terrain maps; sensor data; vehicle health and status; weather reports; 
threat conditions; target locations), and automated planning products (e.g., vehicle 
routing and task assignments; alternative plan options; safety alerts) in ways that faci- 
litate operators’ awareness and deep understanding of critical system interactions, as 
well as constraints and affordances for control. The outputs of these analytical efforts 
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led to descriptions of key cognitive tasks and interaction requirements. These prod- 
ucts drove multiple loops of design, prototyping, and evaluation activities, which 
allowed us to rapidly assess the technical risk and feasibility of emerging design con- 
cepts, while simultaneously gaining feedback from domain experts and potential us- 
ers. Key findings from these design efforts are described below. 


4 Ecological Design Strategies for Automation Transparency 


Based on the modeling activities described above, we designed and prototyped a se- 
ries of ecological mission display concepts for supervising heterogeneous unmanned 
vehicle teams in a variety of operational contexts. These concepts ranged from indi- 
vidual, task-specific display forms (e.g., a widget optimized for managing available 
fuel considerations when addressing pop-up tasking; a multi-vehicle mission timeline) 
to full workspaces that incorporate and coordinate such display components within 
navigable views that can be tailored to address specific mission configurations and 
operator roles. Across these efforts, we have applied general EID design heuristics 
(see [6] for a comprehensive primer) to address the specific operator support needs, 
information requirements, and underlying functional structures gleaned from our 
domain analyses. The resultant interface solutions have been tailored to particular 
missions, vehicle configurations, and operator tasks. However, they also highlight a 
number of generalizable design strategies for increasing the transparency of un- 
manned systems, much as prior EID literature has provided similar exemplars for the 
process control and medical domains [6]. A subset of these applied EID strategies is 
discussed here. 


4.1. Increasing the Perceptual Availability of Task-Critical Information 


One of the key challenges facing supervisory controllers is that of understanding and 
confirming (or recognizing the need to intervene and adapt) automated decision out- 
comes, such as vehicle tasking or path planning. To do this effectively—and avoid 
automation evaluation errors that can lead to surprise or disuse [7]—operators must 
recognize and efficiently access the key system variables that affect automated out- 
comes. Unfortunately, geospatial (map) displays, which are the dominant frame of 
reference for most supervisory control interfaces, do not comprehensively support this 
need. Geospatial displays excel at conveying spatial constraints, as seen in Figure 
1(a), where an automated path plan (blue line) can be intuitively perceived as avoid- 
ing a navigational threat (red circle) on its way to a target. However, when automation 
outcomes are driven by constraints that are not directly spatial in nature (such as the 
time it would take for a vehicle to reach a location, or the ability of a vehicle’s on- 
board hardware to support a specific sensing task), typical geospatial display ap- 
proaches are insufficient to support operator understanding. As seen in Figure 1(b), it 
may not be readily apparent why an automated planner has chosen to route a particu- 
lar vehicle to a target when other vehicles are physically much closer. 
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Fig. 1. (a) When key planning constraints are spatial in nature, automated planner outputs may 
be intuitively presented in a map display; (b) However, when key constraints are not directly 
spatial (e.g., the travelling speed of a vehicle; the efficacy of onboard sensor payloads), map- 
based displays of automated outcomes are much less intuitive 


In geospatial displays that use standard military symbology (MIL-STD-2525C; 
[19]) vehicle icons typically encode only spatial locations and gross platform differ- 
ences (e.g., whether a vehicle is friendly or foe, ground or air-based, fixed-wing or 
rotary). In this case, to understand how individual vehicle differences have affected an 
automated tasking response, the operator must perform multiple drill-down searches 
through vehicle details, for example clicking on individual vehicles to identify their 
sensor payloads and travel speeds, as in Figure 2(a). With this approach, the operator 
must mentally consider and compare other vehicles to the one selected by the automa- 
tion, using a serial exploration process that is time consuming and places a significant 
load on working memory. 
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Fig. 2. (a) Traditional drill-down display, with hidden data accessed serially through pop-up 
windows; (b) An example of an ecological display alternative, with data provided in parallel 
through explicit visual cues—in this case time-to-location coded as icon size, and sensor effica- 
cy coded as icon opacity 


In contrast, ecological display approaches, such as Figure 2(b), can support opera- 
tors’ direct perception of the non-spatial considerations that led to an automated 
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planner’s decision—in this case, by visually mapping calculations of sensor/target 
pairing efficacy (icon opacity) and time-to-target estimates based on platform speed 
and distance (icon size). This mapping “makes visible the invisible,” while also in- 
creasing the perceptual salience of the most promising vehicle options (i.e., those that 
can get to the target both quickly and with ideal equipment). Combined, this enables 
the operator to rapidly consider alternative choices across the vehicle set in parallel, 
and intuitively interpret automated planning outcomes. While this example focuses on 
geospatial displays, we have similarly applied a range of chained visual transforma- 
tions (including manipulations of hue, saturation, blur, and animation effects; see 
[20]) across mission timelines, asset/task link diagrams, and health and status views. 

Beyond visually encoding the key system and environmental attributes that drive 
automation outcomes, we have also explored methods to visually represent automated 
behaviors themselves, and particularly the ways in which these may differ across 
heterogeneous vehicle teams. For example, differences in platform type and onboard 
sensing and processing capabilities may have profound impact on how different ve- 
hicles within a team may respond to abnormal events, such as a lost communications 
link. While better-equipped vehicles may be able to continue autonomously for some 
time on a pre-filed course in the absence of communications, it is also typical for 
many vehicles to continue at their current heading and altitude indefinitely, or to ab- 
andon an established flight plan after only a short period time and proceed directly to 
a pre-configured emergency landing location. 


(a) (b) 


Fig. 3. (a) A typical display, communicating only the location (and time) of a critical event 
(e.g., lost communications), and forcing the operator to reason about future vehicle behavior; 
(b) Example of an ecological display alternative, using explicit visual cues to inform and aug- 
ment the operator’s mental modeling of vehicle state 


Unfortunately, if they show anything at all, supervisory control displays often 
simply reflect the location, and possibly time, of a system state change (e.g., a comms 
link switching from “active” to “lost’’), and not the impact of this event, as in Figure 
3(a). This forces the operator to anticipate how the particular vehicle will respond to 
this new situation and invites significant opportunity for operator surprise in the event 
of an incorrect or misapplied mental model [1]. In contrast, an ecological approach 
such as that shown in Figure 3(b) increases system transparency by explicitly 
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representing the processes governing system behavior. In this particular example, the 
display not only indicates the lost communications event time and location, but also 
explicitly represents anticipated behavior based on the vehicle’s loaded operating 
protocol (proceeding directly to a an emergency landing site), the estimated progress 
against that plan in the time since the event, and the expected behavior should com- 
munications be regained (an immediate redirection to the next waypoint). 


4.2 Presenting Information in Context 


Beyond simply increasing the perceptual availability of task-critical information, 
ecological design techniques emphasize situating this information in context. As with 
more traditional process control systems [9,10], unmanned system displays benefit 
when health-and-status and automated planning outcomes (e.g., available pounds of 
fuel; engine speed; altitude; time-on-station) are provided against the framing of ex- 
pected values, nominal minimum/maximum ranges, and critical limits (e.g., total fuel 
capacity and minimum-remaining fuel requirements; normal and red-line RPM levels; 
aircraft flight performance envelopes). Additionally, useful temporal context can be 
provided by showing changes in data values over time (e.g., a trailing graph of engine 
pressure) or calculating and then graphically representing instantaneous rates of 
change (“engine temperature is 280 degrees, but RISING RAPIDLY”) Such visual 
depictions of range and temporal context aid the operator in interpreting how current 
system operations compare to expected behaviors and critical safety boundaries, and 
support timely perception of when such boundaries may be breached. 

Supervisory control displays can also be improved by presenting information 
attributes not only within the context of their own expected limits, but also within the 
context of other information that pertains to related system functions (with the struc- 
ture of these relationships identified through the previously described AH modeling 
process [5,6]). Unfortunately, many supervisory control displays artificially disperse 
related system information over discrete, stovepiped views (e.g., maps, timelines, 
health-and-status dashboards), both as a matter of convention and convenience. This 
approach inadvertently serves to mask critical relationships that occur across view 
boundaries—for example, relationships such as those between engine RPM, altitude, 
wind speed, and the aeronautical distance of a mission leg, all of which directly im- 
pact fuel consumption and, with it, available time on station. 

In contrast, EID methods purposefully seek to integrate these diverse representa- 
tion modalities within coordinated display perspectives that explicitly reflect these 
complex relationships. Figure 4(b) shows how such an approach could support com- 
mon fuel or power management tasks (which are often performed in-the-head during 
re-plan, relying on heuristics and estimations that are subject to calculation error). The 
left-most image depicts estimated fuel to be consumed by each leg of the mission 
flight (green shaded segments) against the context of overall fuel capacity (full set of 
squares), the amount of fuel that is currently available (sum of all shaded squares), 
anticipated fuel reserves (dark grey), and the minimum amount remaining reserve 
fuel that is required by mission safety doctrine (red line). If this particular vehicle is 
allocated to a pop-up task (center image), the fuel cost of this activity is added to the 


386 R. Kilgore and M. Voshell 


display (indicated by light blue squares) and the total fuel consumption is visibly 
pushed beyond the minimum safe reserve amount required (indicated by red squares). 
As the operator directly manipulates elements of a coordinated mission plan display 
(not shown here, but see Figure 6 for an example)—perhaps by increasing the altitude 
of the first mission leg and reducing the travel speed of the third—the efficiency gains 
anticipated by these changes are represented directly within the context of the fuel 
display. The coordinated behaviors enable the operator to intuitively sense of the 
maximum gains to be had in manipulating attributes of a particular mission leg, as 
well as when the combined impact of some set of changes is sufficient to overcome 
the negative impact of the pop-up task on the fuel safety margin. 


Vehicle A | Vehicle A | 
Vehicle B | Vehicle B | | 
Vehicle C | Vehicle C | 


Vehicle D Vehicle D 


(a) (b) 


Fig. 4. (a) Example of an ecological fuel management display (left), showing the relative im- 
pact of a pop-up tasks on available fuel reserves (center), as well as efficiency gains as altitude 
and time-on-station variables are manipulated in a coordinated flight plan display (not pic- 
tured); (b) Example of an ecological mission coordination display, showing the relative impact 
of two different vehicle retasking options on overall team and mission efficacy as these plan 
alternatives are selected on a map (not pictured) 


In a similar example of context, Figure 4(b) shows a mission coordination display 
that presents the relative timing vehicle activities with respect to established goals and 
windows of opportunity. Continuing the example of the pop-up task, automated 
recommendations for vehicle-retasking strategies (and their resultant path plan mod- 
ifications) may be depicted in a map view (not shown here). As the operator explores 
alternative retasking plans by selecting them in the map view, this coordinated display 
provides a depiction of the relative impact on current vehicle tasking, against the tem- 
poral context of acceptable servicing windows (e.g., the time during which the current 
tasks must be completed for the mission to be of value). In this example, assigning 
Vehicle A to the new pop-up (left) results in a delay to the primary mission, but one 
that is within acceptable bound. In contrast, assigning vehicle C not only pushes that 
vehicle’s primary task out of the acceptable window, it also negatively impacts ve- 
hicle B, which must perform a coordinated task within a similar period of time). Such 
explicit context enables the operator to readily assess automated behaviors. 


4.3. Managing Operator Attention 


One of the central design strategies of EID is to create display figures whose emer- 
gent visual behaviors—driven by mapping graphical sub-elements of the figures 
to specific low-level attributes of the dynamic work-domain—treflect higher-order 
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system properties [5,6]. When designed well, these mappings (which can be as simple 
as the scale and opacity icon transformation strategies shown in Figure 2b) modulate 
the perceptual salience of elements across the display, automatically directing the 
operator’s attention towards critical system process information and causing less criti- 
cal information to recede into the background. 

An example of this salience mapping approach can be seen in Figure 5 (a), where a 
vehicle is traveling out of the range of its primary emergency landing site, at which 
point the operator must confirm a secondary site. When this transition point is far 
in the future, the boundary is flagged as a simple stroke across the planned path. 
As the vehicle approaches this point, however, the stroke gradually grows in size 
and salience, and additional cues (all of which would otherwise clutter up the display) 
are incrementally added to increase the salience of the pending alert, clarify the spe- 
cific nature of the alert type, and recommend a secondary sight for selection, as in 
Figure 5(b). 


(a) (b) 


Fig. 5. Example of an ecological display using variable perceptual salience cues to direct opera- 
tor attention while managing clutter; (a) a simple marker (green arc) flags an upcoming event 
boundary; (b) as the vehicle approaches the marker (both in space, and in time) the cue be- 
comes more salient and additional information regarding the anticipated automation behavior is 
provided—in this case signaling that the aircraft is about to head out of range of the primary 
emergency landing location and the operator must confirm a secondary location 


Unfortunately, it is impractical to support all management of operator attention 
through emergent display features—both because display designs would quickly be- 
come overwhelmingly complex, and because not all requirements for directing opera- 
tor attention can be known a priori. Through our interactions with operators—both 
during our analyses of the work domain and in our subsequent walkthroughs of our 
design prototypes—we learned that it is often the relative amount of time until key 
events (e.g., “check the available fuel and confirm the emergency landing site location 
when we are five minutes outside of the search area’) that is more critical to cueing 
and directing operator attention than absolute timing (e.g., “check the fuel at 1345”), 
particularly when the future time in question is not easily calculated from available 
display information. This is particularly true in directing operators’ prospective mem- 
ory, or the memory to recall and perform a task in the future. Unfortunately, supervi- 
sory control displays rarely capture and manage such relative times explicitly. Instead, 
they must be calculated and then later recalled by the operator, often via physical 
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reminders, such as Post-it Notes. In addition to being error prone under high- 
workload settings, these methods are divorced from the actual supervisory control 
information system and thus become inaccurate (or irrelevant) when changes occur to 
vehicle plans or in the environment (e.g., an unanticipated headwind aloft adds 30 
minutes of travel time to the search area). 


- 00:05.00 


Fig. 6. Example of a prospective memory aid in the context of a mission timeline/altitude dis- 
play; the operator has chosen to pin a notification not to a specific mission time, but rather a 
relative one—in this case the traversal of a marked airspace (purple shading) 


To address this need, we have designed a number of light-weight interaction me- 
thods that enable operators to readily establish and manipulate such relative-time 
reminders within the display itself. For example, as shown in Figure 6, the operator 
can select an element within a timeline display—such as a waypoint, or a marked 
airspace that is being traversed—and with a single click, pin a notification to the start 
of that event, regardless of the absolute mission time at which it occurs. Similarly, the 
operator can leave reminders by interacting with route plans, waypoints, or other ob- 
jects across the display (e.g., selecting a distance or time range from a location pin on 
a map), aiding their future recall to perform critical control tasks. 


5 Conclusions 


This paper has presented the results of several recent and ongoing efforts to improve 
the transparency of unmanned system automation through the design of ecological 
supervisory control displays. Although this work has focused on supporting specific 
missions, vehicle teams, and operator tasks within the maritime domain, we believe 
that many of the display concepts described may be generally applied to the design of 
supervisory control tools for heterogeneous unmanned systems. As such, we hope 
these concepts provide useful resources for other developers of unmanned systems. 
We are currently undertaking an effort to further refine these and other related design 
concepts, as well as to formally evaluate their utility in enabling operators to more 
efficiently and effectively supervise heterogeneous teams of unmanned vehicles. 
Based on the outcomes of these evaluations, we hope to leverage our efforts to guide 
capabilities requirements and design guidelines for new and emerging unmanned 
vehicle control systems, such as the Navy’s Common Control Station. 
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2 


Abstract. The requirement about reusable 3D visualization tool was conti- 
nuously raised in various industries. Especially in the defense modeling and si- 
mulation field, there are abundant researches about reusable and interoperable 
visualization system, since it has a critical role to the efficient decision making 
by offering diverse validation and analyzing process. Also to facilitate the ef- 
fectiveness, many current operating systems are applying VR(Virtual Reality) 
and AR(Augmented Reality) technologies aggressively. In this background, we 
conducted the research about the design for the collaborative visualization envi- 
ronment for the warfare simulation through commercial game engine. We de- 
fine the requirements by analyzing advantages and disadvantages of existing 
tools or engines like SIMDIS or Vega, and propose the methods how to utilize 
the functionalities of commercial game engine to satisfy the requirements. The 
implemented prototype offers collaborative visualization environment inside the 
CAVE environment, which is the facility for immersive virtual environment, by 
cooperating with handheld devices. 


Keywords: 3D Visualization, Game Engine, Warfare Simulation, Collaborative 
Visualization Environment. 


1 Introduction 


At present, 3D visualization tools are employed both directly and indirectly in re- 
search fields that require intuitive analysis and accurate data validation. A well- 
known application can be found in the product design field, which moved on to the 
3D CAD (computer-aided design) system from paper drawings. Moreover, in the pre- 
manufacturing stage, the manufacturing process can now be simulated and analyzed 
based on the 3D visualization environment. The process based on this type of 3D- 
visualization-oriented analysis and validation has better effects compared to the use 
of traditional values or parameter-based reports of the result [1]. The requirements of 
3D visualization techniques in the defense modeling and simulation field also can 
be estimated, as current commercial battle lab systems actively include 3D visualiza- 
tion functions, as do training simulators, which inherently require 3D visualization 
capabilities. 
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However, there are several issues that require attention when adopting 3D visuali- 
zation in the defense modeling and simulation fields. The main obstacle is related to 
the time and cost required for development. The characteristics of defense modeling 
and simulation systems are such that interoperability is mandatory, as new systems 
are frequently developed at relatively high costs. Interoperability can suppress the 
duplicated costs incurred during the development, interoperation and maintenance of 
the system. On the other hand, interoperability is closely related to reusability, as 
reusability is ensured if the interoperability requirement is satisfied to a certain level. 

In this research, our goal is to develop an efficient decision-making environment 
between experts by providing the 3D visualization result of a warfare simulation. This 
is termed here a collaborative visualization environment. The proposed collaborative 
visualization environment is a limited concept stemming from the CVE (collaborative 
virtual environment) in its physical space as a CAVE (cave automatic virtual envi- 
ronment). Because the CAVE is intended to provide an immersive environment 
through a high-resolution multi-channel visualization system, it provides a satisfacto- 
ry user experience. However, the system is usually designed for a single user with one 
shared screen. Therefore, we provide a collaborative visualization environment by 
adopting already widespread personal devices, in this case the smartphone and tablet. 
The interoperability and reusability problem is addressed simultaneously with the 
visualization quality and collaborative issues. 


2 Related Research 


The related researches and cases can be divided into three categories based on the 
interoperability level. The researches for the first category are the system dependent 
development cases which is widespread method in the current operating simulators. 
The researches belonging to the second categories are the works based on the 
HLA(High Level Architecture)/RTI(Run-Time Infrastructure), which is IEEE 1516 
standard. The researches for the third categories are proposing the custom data struc- 
ture considering the reusability issue with performance improvement. 

At first, the common approach in the various systems is developing the integrated 
structure with simulator and visualization module. In [2], the large scale visualization 
system is adopted for the digital mock-up and driving simulation of the evaluation 
process in the maglev business. The Ogre3D engine was used for 3D visualization in 
this research, based on the classification and comparison between diverse graphics 
engines and toolkits. In [3], the simulation architecture was proposed based on the 
visualization engine for the real-time visualization of the defense simulation system. 
Also by providing the plug-in functions to manipulate the visualization algorithm, 
user can customize their visualization results. In [4], the objective was similar with 
the [3], but they focused on the representation of the synthetic environments that can 
be used in defense modeling and simulation systems. In [5-7] researches, the ground 
and aerial warfare simulation system was developed in the integrated structure with 
simulator and visualization module. In [5], the XNA, which is commercial game engine, 
was used and in [7], X-Plane was used for the visualization. In [6], authors pointed out 
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the problem of cost-effectiveness with the commercial visualization tools. So they de- 
veloped the novel LOD controlling algorithm which was described as a key problem in 
the visualization of the aerial warfare simulation. 

In the previous researches the visualization function was developed as a part of 
supporting tool. And since the visualization module was integrated with the system, it 
is system dependent and the redundant cost for development is unavoidable as we 
described in the background section. 

Typically in the defense modeling and simulation field, using HLA/RTI is consi- 
dered as an efficient way of solving this problem [8]. HLA/RTI is the methodology to 
guarantee the interoperability and improving the reusability by standardizing the con- 
figuration method of the middleware. In [9], authors pointed out the interoperability 
problem from using the various 3D models in the single simulation system. So the 
proposed the Scene Simulation Platform based on the HLA/RTI. In [10], X-Plane and 
Google Earth was designed to interoperate based on the HLA/RTI to enable the geos- 
patial information on the simulator. Furthermore, the simulation result was visualized 
on the Google Earth environment by logging the result in KML(Keyhole Markup 
Language) file. In [11] and [12], HLA/RTI was aggressively adopted to enable the 
real-time monitoring and visualization by constructing the visualization module as a 
separate federate. Since all the systems mentioned above are constructed as HLA- 
compliant, the efficient interoperation is possible by facilitating the standard-based 
interoperation. 

However, the discussion about the semantic interoperability is not fully investi- 
gated in the previous researches. To satisfy the semantic interoperability, the data 
should be exchanged in unambiguous and shared manner which can be supported by 
the analysis and capability of the data in the individual system. HLA/RTI can guaran- 
tee the syntactic interoperability between the systems while we can point out the lack 
of consideration about the semantic interoperability. For instance, Vega[13] and Me- 
raVR[14] provides the user interface, API(Application Programming Interface) and 
additional package to extend the visualization system as semantic interoperable. It can 
be a solution of efficient visualization if there is no restriction about the target simula- 
tor/federator. However, the cost and time consumption is still considerably high for 
the extension of semantic interoperability. 

Another way to get the semantic interoperability is giving the limitation about the 
data that system can handle. In [15], author proposed the Universal Heterogeneous 
Database Metadata Description, which enables the integrated description about the 
battlefield by designing the data structure which has capability of heterogeneous si- 
mulation result. In [16], XML(Extensible Markup Language schema) schema was 
proposed to represent the state of the object in web-based battlefield visualization. In 
[17], the data model for construction simulation was proposed and the result was vi- 
sualized. At last, SIMDIS[18] proposed the ASI format for similar objective and de- 
veloped the visualization tool for defense modeling and simulation systems. 

In summary, the visualization tool which depends on the simulation system suffers 
from redundant cost consumption for development. There are researches based on the 
HAL/RTI to attempt to solve the problem but still remain the requirements about the 
researches considering the semantic interoperability. For now, the semantic interope- 
rability can be achieved by limiting the capable data of system. 
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3 Proposed Method 


3.1 Overview 


In this research, we adopt an approach that limits the capable data of a system in an 
effort to develop a collaborative visualization environment. This approach allows the 
efficient development of a reusable visualization tool at the level of current technolo- 
gy. The data is intended to represent the result of a warfare simulation. Also, to pro- 
vide an efficient collaborative visualization environment for decision-making 
processes between experts in a 3D visualization environment, we develop a system 
that is based on the CAVE system, which works with state-of-the-art visualization 
techniques coupled with ubiquitous devices. 

To address these issues, we define two underlying currently available technologies. 


1. SIMDIS data file structure 
2. 3D visualization and networking technologies in commercial game engine 


First, SIMDIS is a well-known analysis and display tool developed by NRL (Naval 
Research Laboratory). SIMDIS can be utilized for result analyses in the defense mod- 
eling and simulation field. One of its advantages is that the implementation process is 
not necessarily for the visualization session. The ASI data file structure is well de- 
fined and has large coverage of warfare related simulation results; therefore, simply 
logging or parsing the simulation result allows instant visualization. Various use cases 
and related research show the semantic interoperability of SIMDIS in defense model- 
ing and simulation fields. Our research allows the data file structure of SIMDIS to 
achieve a certain level of semantic interoperability of the tool as well as the syntactic 
interoperability of existing simulators with functions that log the results with a 
SIMDIS data file. 

On the other hand, SIMDIS lacks functions for game-like scene generation, unlike 
other visualization toolkits such as Vega or MetaVR, as SIMDIS focuses more on 
objective analysis. In our research to meet the needs of game-like scene generation 
efficiently, we decided to use a commercial game engine. Current commercial game 
engines are capable of relatively high performance for 3D visualization, and the net- 
working functions of a game engine can be employed to construct a collaborative 
environment. 


3.2. Hardware Configuration 


The collaborative visualization environment proposed in our research is based on 
the CAVE facility. The iCAVE facility at KAIST was built to provide an immersive 
virtual environment with a resolution of 6400x1920 pixels in a field-of-view angle 
of 120 using a seven-channel display on a cylindrical screen. The scene for each 
channel is controlled by single multi-channel client run on a desktop PC, and the main 
controller PC is set to manipulate the entire visualization system. Furthermore, per- 
sonal handheld devices cooperate with the main controller to provide domain-specific 
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data to individual users. The overall hardware configuration is illustrated in Fig. 1. 
The main controller PC and multi-channel clients generate the scene by means of 
distributed visualization with the master-slave concept. Handheld devices are con- 
nected to the main controller PC to synchronize the visualization time with the gener- 
ated scene on the shared screen. 


Screen 


Multi Channel 
Visualization 
4 System 


Main Control 


Tablet/Smartphone 


Fig. 1. Overall hardware configuration 


3.3. Module Design 


The entire system is designed to have a modularized structure for easy maintenance. 
Each module performs independent functions, and the data exchanges are accom- 
plished through an interface that is de-signed specifically for this research. Therefore, 
if an update is required in the future, there is an advantage to using this type of mod- 
ular design, as easily changing an individual module is all that is required. 

The modules are divided into the data processing module, the weapon system visu- 
alization module, the terrain/environment visualization module, the animation mod- 
ule, the graph plot module, the user inter-face module and the multi-channel module. 
The weapon system visualization module and the terrain/environment visualization 
model also have an interface between their own 3D model databases. The layer struc- 
tured module is illustrated in Fig.2. As noted in section 3.1, the component functions 
in a commercial game engine consist of upper-level visualization modules designed 
as part of this research. These modules run on each hardware platform to offer colla- 
borative visualization 
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Fig. 2. Layer structure of the system and modules 


3.4 Development Environment 


With the designed module, a comparative study was carried out to determine the best 
possible development environment to use. The comparison factors are divided de- 
pending on the visualization, user interface, and networking aspects, and the compati- 
bility. Each factor is compared in terms of its performance and efficiency during the 
development process. Compatibility pertains to the handheld devices, which are based 
on the Android OS for smartphones. Finally, the cost is also an important factor for 
development, as our system is based on the CAVE environment, which in some cases 
requires a separate license for each client PC. The result is illustrated in Table 1. 

First, SIMDIS is a complete tool package which lacks extensibility compared to vi- 
sualization engines, thus making multi-channel visualization difficult to achieve. OSG 
(OpenSceneGraph), Ogre3D and Delta3D are open-source rendering/game engines 
which have similar characteristics apart from their detailed features, such that Ogre3D 
is weaker at geospatial data handling and Delta3D supports HLA/RTI related features. 
Vega and Unity3D are high-level engines which have greater functionality than oth- 
ers. Vega offers various functions in the form of plug-in packages, but the cost is not 
negligible. On the other hand, Unity3D is one of the most actively applied engines in 
the current game market, and the range of application is expanding to the engineering 
and science fields. In particular, extensibility and compatibility to diverse platforms is 
considered to be a major advantage of this engine despite its relatively low cost. This 
advantage can lower the cost and shorten the time of development. In our research, 
the Unity3D engine was utilized for good cooperation with handheld devices in a 
collaborative visualization environment given its sufficient 3D visualization quality. 
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Table 1. Comparison of the development environments 
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4 Module Development 


In Unity3D, the application is constructed by setting the component functions in the 
scene graph nodes, known as GameObject. The components, including the rendering, 
networking and particle effects and the encapsulating scripts, readily enable efficient 
development. Thus, the upper-level module consists of the set of scripts, and the 
scripts control component functions and external libraries simultaneously. 


4.1 Data Processing Module 


The data processing module passes the data through the interface to the other modules 
after processing the result of the simulation data and stores it in the defined data mod- 
el. In this research, we utilize the data format of SIMDIS to define the data model for 
defense modeling and for the simulations. The data model is a class of model which 
includes the overall scenario information (reference coordinates, reference time), the 
platform information (platform ID, classification and the name of the 3D model), the 
platform data (position, velocity and orientation along the simulation time), among 
other data. The core function of the data processing module is the parsing of the data 
from the result of the simulation into the class to hand over the data to other modules 
if the proper requirements are detected. 
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4.2. Weapon System Visualization Module 


The weapon system visualization module manages a range of saved data in the data 
processing module, such as scenarios, platform information and the weapon model 
database. It also loads 3D model data for visualization depending on the scenario. To 
generate game-like scene, the visualization module can generate particle effects such 
as smoke and flames on the 3D model. In Fig. 3, left side figure shows a visualization 
result on Unity3D through the weapon system visualization module. 


4.3. Terrain/Environment visualization Module 


The terrain visualization module performs the loading of the terrain model near the 
referenced coordinate system of the scenario from the terrain database. In addition, 
this module performs the rendering of the sea and atmospheric environment to con- 
struct the overall environment corresponding to the scenario. The terrain database 
includes significant geographic information because the 3D terrain polygon models 
created through the pre-processing of a DEM (digital elevation map) and satellite 
images contain latitude and longitude information. An ocean surface model is also 
generated according to the camera projection matrix to create an un-bounded ocean 
surface, and the clouds are created using a volume model to create a realistic and 
atmospheric scene. Additionally the underwater effect using particles and the terrain 
using observed bathymetry data is implemented to render the underwater view. In Fig. 
3, right side figure shows a visualized environment which encompasses the use of 
terrain near Ulleung Island of Korea. 


Fig. 3. Weapon system and environments around Ulleung Island visualized with Unity3D 


4.4 Animation Module 


The animation module manages changes of the position and orientation of the weapon 
system in the scenario according to the simulation time. The execution mode of the 
application can be divided into the real-time visualization and the after-action review 
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steps. In the after-action review mode, visualization is performed after the parsing of 
the simulation log saved in the ASI file format from the completed simulation in the 
data processing module, making the visualization time independent on the simulation 
time. However, in the real-time visualization mode, the visualization time is depen- 
dent on the simulation time given that the simulation and the visualization are per- 
formed at the same time. The visualization time proceeds according to the frame up- 
date at a certain interval internally, and the position and orientation of the platform are 
updated with the most current data at that time. Additionally, because the data update 
rate should be held at 60Hz to ensure the production of smooth animations in applica- 
tions such as a game, the position and orientation of the platform are updated using 
linear interpolation even if the simulation data does not exist at some times in the after- 
action review mode. This module allows the creation of smooth animations from simu- 
lation data which has irregular time intervals generated by discrete event simulations. 


4.5 Multi-channel Module 


The multi-channel module is developed using a type of master-slave model to visual- 
ize the entire scene in the iCAVE facility. The master-slave model is one way to real- 
ize distributed visualization, as it only transfers the data for state synchronization and 
runs the same applications on all node PCs. This model has advantages when used for 
large-scale visualization, as the network bandwidth requirement is relatively low [18]. 

Each multi-channel module recognizes the role of the master or the slave relative 
to themselves from the external initialization file at the very beginning. If the node is 
a slave, the view frustum of the camera of the master node is divided by seven and 
only renders the scene for each respective assigned camera region. The transferred 
data can be divided into the command information that is the one-time events, and 
streaming information which needs continuous transfer. The streaming information is 
transferred at 60Hz from master to slave and the command information is transferred 
immediately if the command occurs in the master. According to the information trans- 
fer, the same scene can be visualized in a multi-channel environment with the scene 
generated in the master. The visualized result in the iCAVE environment is illustrated 
in Fig. 4. 


4.6 UI Module 


The UI module enables the user to control and manipulate the visualization system 
using GUIs such as buttons or scroll bars. For instance, the input of a scenario file, 
termination of the application, and environment control and manipulation of the cam- 
era position are processed by means of user input commands. The UI module then 
transfers this information to the appropriate module through the interface. 


4.7. Graph Plot Module 


The graph plot module is a separate module which represents detailed values of the 
data of the position and orientation of individual platforms in the scenario. The graph 
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Fig. 4. Visualization result in iCAVE at KAIST 


plot module is developed based on handheld devices, as the interests of individual 
experts can differ. Thus, to realize an immersive multi-channel environment and pro- 
vide expert-specific information simultaneously, we provide detailed information with 
personal devices. In a situation in which divers (the experts in this case) are in the 
decision-making process, displaying the detailed values of each platform on a single 
screen is not efficient. In our study, handheld devices run the graph plot module, the 
data processing module, the simplified UI module and the multichannel modules for 
the synchronization of the visualization times so that detailed values can be observed 
on a smartphone. 


5 Result 


The entire system is implemented with the designed modules and operated with one 
master PC, seven slave PCs and two handheld devices, which are both smartphones 
running the Android OS. For the test scenario, a decoy operation scenario which in- 
cludes the maneuvering of a submarine and a decoy and a battleship with the movement 
of its torpedoes is included. A surface-to-air and surface-to-surface missile operation 
scenario based on an engineering-level model was also tested. Each scenario includes 
three to six weapon systems and, for the decoy operation scenario, two additional 3D 
models were visualized to represent the detection range of the torpedoes. 

The visualization was successfully done for run-time visualization and after an ac- 
tion review. However, in the orbit camera mode, there were shaking effects in the 
scene. The orbit camera mode keeps the observed platform always at the center of the 
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camera such that if the position of the platform is updated frequently, any small syn- 
chronization difference between the animation module and the multi-channel module 
generates defects. This problem can be solved if the slave itself calculates the camera 
position by changing its updating method for the camera position as the command 
information. 


6 Conclusion 


In this research, we proposed a method to develop a collaborative visualization envi- 
ronment for a warfare simulation which involves the design of the required modules 
and the functions of the modules, and we implemented it using a commercial game 
engine. Regarding the interoperability of the visualization environment, we focused 
on semantic interoperability with the use of an existing data model stemming from a 
frequently utilized tool in the defense modeling and simulation field. In addition, 
game-like scene generation is achieved at a relatively low cost via an appropriate 
commercial game engine. Finally, the utility of the system is enhanced using hand- 
held devices to provide expert-specific information. 

For future works, the issue of HLA/RTI compliancy can be considered in an effort 
to improve the interoperability level by referring, for instance, to RPR-FOM. Moreo- 
ver, the functionality of the handheld devices can be expanded to provide a more effi- 
cient collaborative environment for the decision-making process of the experts. 
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Abstract. Virtual Environment for Life On Ships (VELOS) is a multi- 
user Virtual Reality (VR) system that supports designers to assess (early 
in the design process) passenger and crew activities on a ship for both 
normal and hectic conditions of operations and to improve the ship de- 
sign accordingly al). Realistic simulations of behavioral aspects of crowd 
in emergency conditions require modeling of panic aspects and social 
conventions of inter-relations. The present paper provides a description 
of the enhanced crowd modeling approach employed in VELOS for the 
performance of ship evacuation assessment and analysis based on the 
guidelines provided by IMO’s Circular MSC 1238/2007 Qj. 


1 Introduction 


Under the impact of a series of events involving large number of fatalities on pas- 
senger ships [3], the International Maritime Organization (IMO) has developed 
regulations for RO-RO passenger ships, requiring escape routes to be evaluated 
by an evacuation analysis described in IMO’s Circular MSC 1238/2007, entitled 
Guidelines for evacuation analysis for new and existing passenger ships (ay. It is 
worth mentioning that, although the evacuation scenarios in i address issues 
related to the layout of the ship and passenger demographics, they do not ad- 
dress issues arising in real emergency conditions, such as unavailability of escape 
arrangements (due to flooding or fire), crew assistance in the evacuation process, 
family-group behavior, ship motions, etc. To heal such deficiencies, (aj adopts 
the mechanism of safety factors. 

Crowd simulation is a complex task with issues related to collision avoidance, 
considering a large number of individuals, path planning, trajectories and so 
forth. Depending on the application, other requirements such as real-time simu- 
lation is needed to populate virtual environments in VR systems. Moreover, in 
order to provide a tool to simulate behavioral aspects of crowd in emergency con- 
ditions, panic aspects and social conventions of inter-relations are needed, |4, 

In general, three approaches are used to model crowd motion. The Fluid model, 
where fluid equations, such as Navier Stokes equations, are used to model crowd 
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flow le[g). The Cellular Automata (CA) model, which are discrete dynamic sys- 
tems whose behavior is characterized by local interactions. Each CA is made 
up of a regular lattice of cells and at each unit of time the state of each cell 
is recalculated by the application of a set of rules to neighboring cells [ol [10]. 
The majority of crowd simulation uses the Particulate approach, which is also 
called the atomic approach. This is also the approach for crowd modeling used 
in VELOS and it is briefly presented in 42.1] The first pioneer work on this area 
was that of Reynolds who worked on simulations of flocks of birds, herds of 
land animals and schools of fish. A later work of the same author 19 extends 
these concepts to the general idea of autonomous characters with an emphasis 
on animation and games applications. A Social force model for crowd simulation 
was introduced by Helbing and Molnar in {13}. They suggest that the motion of 
pedestrians can be described as if they are subject to social forces - Acceleration, 
Repulsion and Attraction- which measure the internal motivation of individuals 
to perform certain actions. By combining these three forces they produce an 
equation for pedestrian’s total motivation and finally the social force model. In 
the social force model was applied to the simulation of building escape panic, 
with satisfactory results. 

The paper is structured as follows: Section 2] presents VELOS’s base: VRsys- 
tem, along with its major components and functionalities including a brief de- 
scription of the employed crowd modeling approach for the performance of ship 
evacuation assessment & analysis, while §3]is devoted to our proposed additions 
in steering behaviors and crowd modeling allowing their usage in ship evacuation 
analysis. Our last section includes the presentation of ship evacuation test cases 
investigating the effects of crew assistance, passenger grouping and fire incidents. 
Furthermore, an additional test case demonstrating the effects of ship motions 
on passengers movement is also included. 


2 The VELOS System 


VELOS is based on VRsystem (a, a generic multi-user virtual environment, that 
consists of mainly two modules, the server and client modules connected through 
a network layer. Figure [I] provides a schematic overview of the VRsystem archi- 
tecture. As depicted in this figure, users’ participation in the virtual environment 
is carried out through the CLIENT module in the form of AVATARS enabling 
them to be immersed in the virtual world and actively participate in the evacu- 
ation process by interacting with agents and other avatars. On the other hand, 
system administrator utilizes the SERVER module for creating the virtual en- 
vironment, setting all properties and rules for the scenario under consideration, 
e.g., scheduling of fire/flooding events, and awaits participants to connect to 
the system. Administrator’s interaction may also take place during simulation 
phase. 

The server module comprises two major components, namely the VRkernel 
and the User-Interface, while the client module has a similar structure and 
comprises customized versions of them, referred to as VRkernelLT and User- 
InterfaceLT; see again Fig. [i} VRkernel is the core component of VRsystem 
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Fig. 1. The VRsystem Architecture 


platform in the server module. It can be thought of as a library of objects and 
functions suitable for materializing the synthetic world with respect to geometric 
representations, collision detection, crowd modeling, motion control and simu- 
lation, event handling and all other tasks related to visualization and scene 
organization. The core functionalities of VRkernel are provided by Open Inven- 
tor, an OpenGL based library of objects and methods used to create interactive 
3D graphics applications. 


2.1 Crowd Modeling for Ship Evacuation 


Crowd Modeling is a major part of VRkernel and, in view of VELOS areas 
of interest (evacuation, ergonomics, comfortability), it could be considered as 
the most significant of its components. It is based on agents, avatars, scene 
objects (such as obstacles) and steering behaviors technology. The term agent 
in VRkernel is used to describe autonomous characters, defined as autonomous 
robots with some skills of a human actor in improvisational theater; see 12). 
Avatars are the system users’ incarnation within the virtual environment and 
their major difference from agents is their controlling entity: humans for avatars 
vs. computer for agents. Steering behaviors technology is the core of VRkernel’s 
crowd modeling and is presented in the following paragraphs while enhanced 
crowd modeling features for ship evacuation are presented in section B] 

The motion behavior of an agent is better understood by splitting it into 
three separate levels, namely action selection, steering and locomotion. In the 
first level, goals are set and plans are devised for the action materialization. The 
steering level determines the actual movement path, while locomotion provides 
the articulation and animation details. 
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Agents’ autonomy is materialized within the steering level, where the steering 
behaviors technology is applied. Specifically, agents’ autonomy is powered by an 
artificial intelligence structure, referred to in the pertinent literature as mind; see, 
e€.g., fa i4. The mind utilizes a collection of simple kinematic behaviors, called 
steering behaviors, to ultimately compose agent’s motion. Specifically, for each 
time frame, agent’s velocity vector is computed by adding the previous velocity 
vector to the mind-calculated steering vector. This vector is a combination of 
the individual steering vectors provided by each associated steering behavior 
in agent’s mind. In mind modeling we employ two different approaches for the 
steering vector calculation. The first and rather obvious one, used in simple mind, 
produces the steering vector as a weighted average of the individual ones. The 
second approach that takes into account priorities, called priority blending, is an 
enhanced version of the simple priority mind proposed in (12). Agent’s velocity 
at each time frame is calculated as follows: 


1. Compute steering vector f = }> wifi, where w; are weights and f; are the 
individual steering vectors from each simple behavior included in agent’s 
mind. 

2. New velocity is computed as: 


Um 
Unew = C* (Uprev + Ff), where ¢ = nin { "1b (1) 
: I|Uprev + Fl 


where, Um is the agent’s maximum allowable velocity. 


Nearly twenty steering behaviors have been so far implemented within VRk- 
ernel. These behaviors, based on the works by C.W. Reynolds and R. 
Green (14}, include: Seek, Arrive, Wander, Separation, Cohere, Leader Follow, 
Obstacle Avoidance & Containment, Path-following, Pursuit, Flee, Evade, offset- 
{Seek, Flee, Pursuit, Evade, Arrive}. 


3 Enhanced Features of Crowd Modeling 


Crowd modeling, as described in [1] can be used to materialize a ship evacuation 
scenario adopting the advanced method of analysis proposed by IMO in circu- 
lars (ay (15). Although this advanced method is more realistic than the simplified 
approach proposed in the same circulars, it is still subject to some restrictive 
assumptions and omissions as, e.g., ship motions, fire/smoke, crew assistance 
and passenger grouping effects which are collectively accounted via corrective 
safety factors. Aiming in the elimination of these restrictions, we herein enrich 
crowd modeling in VELOS with appropriate features, which are described in 
detail in the following sub-sections. These features include the introduction of 
new behaviors, as the Inclination behavior, modeling the effect of ship motions, 
the Enhanced Cohere behavior applied in passenger grouping, and the adoption 
of behavioral models and aids, such as the Triggers supporting crew assistance 
modeling. Finally, passenger’s health index and ship’s space availability are in- 
troduced for modeling smoke and/or fire influence on the evacuation process. 
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3.1 Modeling Ship Motions and Accelerations 


VELOS provides several interfaces for the consideration of ship motions and ac- 
celerations. Specifically, there are modules that allow importing of precomputed 
ship responses either in the frequency or time domain. Furthermore, there is also 
functionality for importing time histories of linear velocities and accelerations for 
selected points aboard a ship that are recorded with the aid of accelerometers. 
Thus, ship accelerations can be either estimated via numerical differentiation of 
ship motions or acquired from the experimental measurements. Generally, ship 
motions comprise time histories of the displacements of a specific point P of ship 
(usually ship’s center of flotation) as well as time histories of ship rotational mo- 
tions (pitch, roll and yaw). Using numerical differentiation we can calculate linear 
velocity (vp) and acceleration (%,) of point P and angular velocity (wg) and 
acceleration (wp) of the ship. Then, using the following well-known relations 
from rigid-body kinematics we can calculate velocity and acceleration at every 
point Q on ship: g = p+ we X Tpq, 0g = Vp + WB X (WB X Lpq) + WB X Tq: 
where, pq is the vector formed by P and Q. 

The effects of ship motions on passengers and crew aboard are modeled in 
two ways as it is presented in detail in the sequel. The first simplified approach 
is based on a kinematic modeling that utilizes the ship motions while the second 
approach takes into account the dynamic nature of the phenomenon and relies 
on the availability of ship accelerations. 


Inclination Behavior. Advanced evacuation analysis in VELOS is combining 
the availability of ship motion data with the so-called Inclination behavior that 
has been introduced, as a first layer, for considering the effect of ship motion 
on agent’s movement. Precomputed ship-motion history is imported in VELOS 
through a suitable series of interfaces. Inclination behavior resembles in defini- 
tion and effect the influence of a gravity field that would hinder agent motion 
accordingly. Specifically, we consider a static global force-vector g normal to 
deck’s plane in the upright position of the ship. If the deck deviates from its 
upright position (i.e., non zero heel, and/or trim, angles), the projection of g on 
it will obviously acquire a non-zero value g,, which forms Inclination’s steering 
vector as follows: f; = A(¢)gp, where A(@) is an appropriate weight function 
depending on the angle ¢@ formed between g and the normal to the deck plane. 
Inclination behavior is active when ¢ lies between two threshold angles: the 
lower threshold is used to discard plane motions with negligible effect on agent’s 
motion, while values above the upper threshold lead to movement inability, as 
the limit of agent’s balancing capabilities is surpassed. Threshold angles and the 
weight function (@) are defined via experimental data; see, e.g., (17). 


Motion Induced Interruptions (MII). During certain weather conditions, 
i.e., rough weather, walking and even more working in the ship becomes diffi- 
cult and even the most experienced sailors will experience events where they 
must stop their activity, be it a specific task or merely standing, and take suit- 
able measures to minimize the risk of injury, or more generally change their 
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stance so that balance can be retained; these events are called, in pertinent 
literature, Motion-Induced Interruptions (MIIs). MIIs can be identified by con- 
sidering the dynamic equations of motions of the person due to ship motion 
leading to the onset of loss-of-balance due to tipping or sliding. Baitis et al 
and Graham et al [19 have proposed the following relations for the con- 
sideration of tips to port or starboard. Specifically, a tip to port will occur if: 
TLATp = ‘ (dis _ Do — gna — tds) > t, and analogously for tip to star- 
board. Similarly, the following tipping coefficients can be derived when consider- 
ing tips to the aft or fore part of the ship: Trona = ; (di + shits _ ts) > ¢ 
and analogously for tip to fore. 

In the above equations, 7 (surge), 72 (sway), and 73 (heave) stand for the 
translational while 74 (roll), 75 (pitch) and yg (yaw) stand for the rotational 
components of ship motion along the z—, y— and z— axis of the ship-coordinate 
system, respectively. Furthermore, D = (D1, D2, D3) = (m,72,73)+(N4, 15,76) X 
(x,y,z) denotes the displacement of point P(x, y,z). Finally, symbols 1, h and 
d denote the half-stance length, the vertical distance to person’s center of grav- 
ity and half-shoe width respectively. Typical values for 4 lie in the interval 
(0.20, 0.25) while for d lie in (0.15, 0.17). 

Taking into account the above discussion concerning tipping coefficients, the 
effect of ship motions on passenger movement is implemented in the following 
way: 


1. Adjustment ©,, of the maximum allowable velocity v», according to the fol- 
lowing rule: Uj, = k- Um, where 


; if Trar <0.20A Tron < 0.15 


1 
(—20T:.ar +5), if 0.20 < Thar < 0.25 A Tron < 0.15 

k= & (—20Trar + 5)(—50Tton + 8.5), if 0.20 < Trar < 0.25 A0.15 < Tron < 0.17 
(—50Tron + 8.5), if Trap < 0.20A 0.15 < Tron < 0.17 
0, if Trar > 0.25 Tron > 0.17 


(2) 


2. Adjustment of w; weight values in computation of the steering vector. A typ- 
ical scenario would include an increase of the wander behavior contribution 
and a decrease in Obstacle Avoidance and Separation contribution. 

3. Adjustment of the parameters of each individual steering behavior. 


3.2 Passenger Grouping 


Passenger grouping in VELOS, as presented in [21], is based on the Enhanced- 
Cohere behavior which constitutes an enhancement of the standard Cohere be- 
havior. Enhanced-Cohere behavior is responsible for keeping together agents that 
are not only geometrically close to each other (as in the standard Cohere be- 
havior), but also belong to the same group, e.g., a family, a crew guided group, 
etc. For this purpose, each agent is endowed with an ID in the form of a com- 
mon length binary representation and the new velocity vector of every agent 


408 K.V. Kostas et al. 


is obtained by applying the standard Cohere calculations on the subset of the 
neighboring agents that belong to the same group. 

In this way, by blending properly the Cohere behavior we can produce different 
grouping levels which can be categorized as follows: 


Grouping Level 0: In this level, grouping is formed indirectly, via a common 
short-term target for the group members, as, e.g., followers of the same leader, 
or through the usage of the standard Cohere behavior. 

Grouping Level 1: The members of the group are endowed with an JD and the 
Enhanced-Cohere behavior described above. Group cohesion is maintained only 
among nearby agents (within Cohere’s neighborhood) sharing a common ID. 
However, if a member of the group gets out of the Cohere behavior’s neighbor- 
hood, the remaining members will take no action. 

Grouping Level 2: The members of the group are endowed with the same prop- 
erties as in Level 1 and moreover at least one member (e.g., the group leader) 
has the responsibility of checking group’s integrity. In this way, cohesion of the 
group is maintained, since if a member of the group is lost the responsible agent 
will take some corrective action, as to wait for the lost member to join the group 
or to search for finding the lost member. 


3.3. Crew Assistance 


Crew-Assistance behavior is offered by affecting the simple- or priority-mind 
mechanism in two ways, either by using Triggers or via the Guide Operation. 

A Trigger attached to a crew agent is a scene object and at the same time 
a scene area (Neighborhood or TN) that, when visited by a passenger agent, a 
prescribed list of actions or property changes, the so called Trigger Actions or 
TAs, are applied to the agent. A TA example could be the following: if passenger 
density at the chosen TN exceeds a prescribed limit, the TA enables the crew 
agent to redirect passengers towards the closest muster station along a path 
different from the main escape route; see scenario 3 in 

Guide Operation is materialized through the Enhanced-Cohere behavior and 
the basic Leader-Follow behavior. A Guide-Operation example could involve a 
crew member that is ordered by the officer in charge to guide a group of passen- 
gers from a specific site to the closest muster station along a path different from 
that provided by the evacuation plan; see see scenario 2 in 

Furthermore improvement of Crew Assistance services could be provided by 
properly combining Triggers with Guide Operation. An example of this combined 
operation could involve a crew member that is charged to guide a group of 
passengers blocked at a space where a fire event is evolved. 


3.4 Influence of Smoke, Heat and Toxic Fire Products 


VELOS offers the possibility to model a fire event during evacuation process 
by permitting passengers/crew to be influenced by smoke, heat and toxic fire 
products that are present in fire effluent. This is achieved by: 


VELOS: Crowd Modeling for Enhanced Ship Evacuation Analysis 409 


— importing precomputed time-series of fire products, according to different 
methods for calculating fire growth and smoke spread in multiple compart- 
ments; see, €.g., fod [a 

— setting the time of fire explosion (before, simultaneously or after the evacu- 
ation starting time), 

— modeling the influence of fire products on the behavioral model of agents 
with the aid of the Function Health. Index presented below, 

— visualizing the fire products in the synthetic world. 


Function Health. Index: In order to model the influence of fire products on 
agents we introduce the Health Reduction Rate function as follows: 


HRR(t) = F(aT(t) + bCco(t)), (Health_units/sec) (3) 


where, F’ describes the used functional model, T is the temperature (°C) and 
Coco the carbon monoxide concentration (ppm) of the space where the agent is 
at the time t (see $4.2). We introduce now the Health Index function as follows 


HI(t) =1- | F(aT (t) + bCco(t))dt (4) 
0 


where, we have assumed that the initial Health Index of all agents is 1. When the 
Health Index of an agent becomes zero the agent is considered dead. Moreover, 
when the Health Index of an agent deteriorates this also affects, by a suitable 
law, its maximum speed (ability of walking). 

Function Space_Availability: In a typical ship evacuation simulation, the path- 
finding module of VELOS computes the required path for each passenger to 
reach their designated muster station from their initial position. The employed 
algorithm is Dijsktra’s shortest path algorithm and is applied on ships topo- 
logical graph where nodes correspond to ship spaces and edges to doors and/or 
passageways. Edge weighting between two connected nodes, in the simplest case, 
corresponds to the walking-distance between the two spaces’ center points while 
this weighting scheme becomes more complex when space availability is con- 
sidered. Specifically, ship spaces availability is connected and contribute to the 
edges’ weighting implemented on the topology graph of ship spaces. For exam- 
ple, an increase of ambient temperature or CO concentration, or a visibility 
decrease in a certain space results in an increase of the weighting factors of 
the edges connected to the graph node representing this space. Consequently, 
paths passing through this particular space is less possible to be chosen by the 
path planning algorithm. Furthermore, when going beyond certain temperature, 
CO concentration and visibility thresholds, the corresponding space(s) is(are) 
rendered unavailable, i.e. removed from the topological graph. 


4 ‘Test Cases 


In this section we use VELOS for performing evacuation analysis for a RO-RO 
passenger ship: 1. with and without crew assistance and grouping behaviors, and 
2. with and without a concurrent fire event. Furthermore, we also examine the 
effect of ship motions on passengers’ movement in the test case described in §4.3] 
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4.1 Crew Assistance and Grouping 


It the first test case examined, one hundred passengers are located in the cabins 
of Deck 5 (see Fig. 2) of the aft. vertical zone of a ship, while Muster Station 
is located on Deck 7. Population demographics are as proposed in [2]. For every 
simulation run we distribute randomly the population in the aforementioned 
areas. Three variations of the above scenario are simulated 3000 times each. 
For each variation, we compute the travel time required for all passengers to 
reach Muster Station as well as cumulative arrival time corresponding to the 
percentage of passengers reaching Muster Station for each time unit. 

In the first variation (Scenario 1), passengers follow the designated escape 
route without crew assistance; Fig. [3] provides a snapshot of the evacuation pro- 
cess. The other two variations involve crew assistance. In Scenario 2 passengers 
are directed by two crew members to follow two distinct routes (see Fig), while 
in Scenario 3 a crew member monitors passengers’ density at a specified place 
and, whenever congestion is likely to arise, he/she redirects a group of passengers 
towards a secondary escape route; see Fig. [h] In both cases, crew assistance is 
materialized through Triggers, which in Scenario 2 involves TAs applied to all 
passengers passing through the corresponding TN, while in Scenario 3 TAs are 
of dynamic character as a result of the attached density sensor. 

Figure [6] depicts the average of the cumulative arrival time for each scenario. 
As it can easily be seen from this figure, Scenarios 2 and 3, based on crew- 
assistance & grouping, achieve a considerably better performance compared to 
Scenario 1. Among Scenarios 2 and 3, the latter is marginally better as a result 
of the dynamic crew-assistance policy adopted. Analogous conclusions can be 
drawn from Fig. [7] where the distributions of travel-time of the three scenarios 
are depicted. Average travel time for Scenarios 1, 2 and 3 are equal to 147 s, 
112s and 113 s, respectively. Moreover, in Scenarios 2 and 3 travel-time distribu- 
tion is narrow-banded, which reflects the effectiveness of the adopted evacuation 
processes versus that of Scenario 1. 


4.2 Fire Event 


In this test case, we have the same arrangement and passenger distribution 
with the first test case; see Fig. 2] Population demographics are as proposed 
in (Qj. A fire event occurs simultaneously with the beginning of the evacuation 
process. The initial fire site is located on deck 5 and depicted in Fig. The 
fire propagation, along with temperature distribution, Carbon Monoxide (CO) 
concentration and visibility due to smoke has been precomputed for all 
affected spaces on deck 5 and the time history of all corresponding quantities has 
been imported to VELOS. Fire and its products (temperature, CO concentration 
and visibility-degradation due to smoke) affect both the availability of ship spaces 
and the movement capabilities of passengers and their health. Space availability 
changes are implemented via the edge weighting mechanism described in §3.4] 
For every simulation run we distribute randomly the population in the afore- 
mentioned areas. The fire scenario under consideration is simulated 360 times 
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and for each run, we record the travel time required for all passengers to reach 
Muster Station and compute the cumulative arrival time corresponding to the 
percentage of passengers reaching Muster Station for each time unit. As illus- 
trated in Fig. [8] the passengers reaching muster station are around 30% less 
when compared to the evacuation without the fire event. This is caused by the 
fire-blockage of passage ways and the resulting fatalities. Furthermore, the slight 
acceleration of the evacuation process depicted in the same figure for the fire- 
event example case is due to the fact that the effective evacuation population 
has been reduced due to the effects of the fire incident and thus the available 
spaces and pathways are used by less evacuating passengers. 


4.3 Ship Motions’ Effect 


This last test case examines passengers’ movement on Deck 5 of a RO-RO pas- 
senger ship with and without ship motions’ effect consideration. Specifically, 
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we simulate the movement of two groups of passengers (20 persons) from points 
A and B respectively, to point C (see Fig. [2) in still water, and at a sea state 
described by a wave spectrum with 4m significant wave height, 11 sec. peak pe- 
riod and 90°ship heading (beam seas). Ship responses were pre-computed and 
imported into VELOS using the SWAN seakeeping software package. The cases 
examined have as follows: 1. Still water (No Waves), 2. (Sea state as described 
above): Kinematic modeling of motion effects through inclination behavior, 3. 
(Same sea state): Dynamic modeling using tipping coefficients implementation. 

Figure J]depicts the average cumulative arrival time to point C for each of the 
three example cases. Each of the test cases has been simulated 500 times and the 
average travel times and arrival rates at point C have been collected. As it can 
easily be seen from this figure the time required for the prescribed passengers 
movement is the least when we are in still water. The effect of the wavy sea state, 
which induces ship motions and hinders passengers movement is illustrated with 
the right-shifting of the remaining two curves. The total travel time needed for 
both inclination behavior and tipping coefficient modeling is about the same 
(~70secs) and considerably higher than the still water case (~50secs), where, 
obviously, no motion effect is considered. However the arrival rate (slope) for the 
tipping coefficient modeling is steeper than the slope of the curve corresponding 
to the kinematic approach. 
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Abstract. This paper suggests an approach to assist the identification of 
suitable areas of application of AR within the product design process. The 
approach utilizes an established methodology for product design development 
that allows each stage in the design process to be identified and considered in a 
logical and structured manner. By doing this we can consider the suitability for 
AR at each stage as opposed to the use of hand drawings, basic computer aided 
design, virtual reality, or rapid prototyping techniques and suchlike to produce 
physical models. As an example of this we consider the concept design stage of 
the product design process and conduct some preliminary experiments in the 
use of AR to facilitate the activity. 


Keywords: Augmented reality, product design, total design, concept design, 
industrial design. 


1 Introduction 


It is apparent that within the realm of product design and manufacture there is an on- 
going need to reduce the time from the identification of a market need for a product 
and the satisfaction of that need in the form of a finished product that meets the cus- 
tomer’s requirements. Over the past few decades an important method of meeting this 
need has been the implementation of the concept of Concurrent Engineering - this is 
an attempt to consider in an integrated and parallel manner, product design, develop- 
ment, manufacture, delivery, maintenance, and end of product life considerations. 
This approach has been employed by many major manufacturing companies and uti- 
lizes multidisciplinary teams comprised of, for example, component suppliers, prod- 
uct design and manufacturing engineers, purchasing personnel, and customers. Effi- 
cient and unambiguous communication of ideas is essential throughout this activity 
and we consider here how this can be facilitated by the use of Augmented Reality. 

As an essential part of this process it is clearly necessary to design the product in a 
rational manner and a number of methodologies have been developed in order to 
achieve this. One of these is “Total Design’ developed by Pugh [1] and defined as 
“The systematic activity necessary, from the identification of the market /user need, to 
the selling of the successful product to satisfy that need — an activity that encompasses 
product, process, people and organization.” The elements of this methodology are 
summarized here and used as a vehicle for identifying specific aspects of the design 
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process where AR could be usefully employed. We show that not only is the use of 
AR beneficial for the product designer but also for improving communication with 
the final customer and others involved in the integrated concurrent engineering exer- 
cise of new product innovation. 


2 Total Design 


Total Design is a methodology that allows a rational and detailed approach to product 
design from identification of a market need through to satisfaction of that need by the 
provision of a desired product. The main stages involved include the following. 


1. Based on the market need a Product Design Specification (PDS) is produced. This 
is a comprehensive document that forms the basis of all the work that follows. It 
does not state what the final design should be but it sets the criteria the design must 
satisfy. As this stage does not require any graphical images AR is not relevant here. 

2. Once the PDS has been completed the “concept design’ stage is implemented to 
create and critically assess potential designs that can satisfy the PDS. Various tech- 
niques, such as brainstorming, are employed to generate the concepts which are 
then compared and evaluated using decision matrices in order to select a final con- 
cept. This paper will show that AR is potentially very useful at this stage. 

3. ‘Detail design’ is now carried out to develop the concept design into a practical 
form. Here the individual components and sub-assemblies are designed and ac- 
companying detailed calculations for mechanical, thermodynamic, electrical, elec- 
tronic and other aspects are carried out. Within this process other ‘design for X’ 
considerations will be considered. For example; design for manufacture and 
assembly, design for ergonomics, design for maintenance, design for the environ- 
ment, and design for remanufacturing are among a number aspects that are impor- 
tant. While AR could be used at this stage there is much more scope for application 
of established computer aided design and simulation methods to develop and ex- 
amine the design. Of course larger products such as ships and aircraft will also 
benefit from the use of virtual reality at this stage. 

4. Manufacturing the product, at this stage simulation packages for factory layout are 
used coupled with computer aided process planning and other computer based 
tools to optimize the work flow, material control, and final dispatch. However 
there is also the opportunity here to utilize AR when considering the positioning of 
production machinery such as industrial robots, CNC machines, conveyors, etc. 

5. Finally, at the stage where the product is being delivered to, and used by, the cus- 
tomer there are already applications in commercial use for AR in product advertis- 
ing and as an aid for product maintenance and repair. 


2.1. Concept Design 


This paper is focused on the potential use of AR in stage 2, the concept design stage. 
It is worth noting that some studies have noted the fact that CAD modelling can be 
harmful to the early stages of the design process, the representation of a component in 
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this form is deemed to be too detailed and overbearing when concepts should be con- 
sidering innovation and development [2,3]. Specifically results of a study by Benami 
and Jin [4] state that “The essential finding from the experiment was that ambiguous 
entities stimulate behaviours more than non-ambiguous entities”. Based on these ob- 
servations, it is apparent that the use of basic CAD modelling at this stage of the de- 
sign process can potentially stunt a designer’s ability for creativity in a design. How- 
ever we consider that it may be possible for the use of Augmented Reality to increase 
levels of creativity for conceptual design, whilst allowing appropriate interaction and 
detail for the designer. 

Two systems presented by Fuge et al. [5] and Fiorentino et al. [6] have looked at 
the use of AR in conceptual design and product realisation. The system presented by 
Fuge et al. focused on the construction of freeform surfaces. The role of multiple 
shape representation was addressed and the user was required to interact using a data 
glove and a head-mounted display in order to create an immersive style environment. 
The system was successful in that it allowed rapid creation of freeform surfaces with- 
out the need for constraints generally required in CAD modelling. A similar system 
was presented by Fiorentino et al. [6] where semi-transparent glasses were used in- 
stead of an HMD.. Again the system allowed a designer to create freeform curves and 
surfaces in an AR environment. Although the objective of the system was to assist in 
product realisation, the use of AR to assist in Rapid Prototyping technologies was 
suggested. They observe that the method of using trial-and-error to evaluate design 
iterations is “one of the biggest bottlenecks in the industrialisation process.” This 
observation was also acknowledged by Verlinden and Horvath in two separate publi- 
cations [7, 8], where the idea that the use of AR to assist in concept generation and the 
reduction of design iteration was introduced. However it appears a knowledge gap is 
present here as the use of AR to support concept realisation is not yet fully investi- 
gated. Ong et al [9] also applied AR in the early design stages during the product 
development process. This was done by introducing a spatial AR (SAR) configuration 
where real world images or textures are projected onto a physical shape model to give 
the impression of the final design which can then be inspected. This is a very basic 
use of augmented reality as an image is simply projected using a projector. Another 
study looked at the use of augmented reality to aid the visualisation of Computer- 
Aided Design (CAD) parts [10]. It was found that certain students had difficulty with 
the spatial cognition of the multi-view projections of a CAD model they had created. 
To resolve this, a quick response (QR) code was placed onto the drawing. AR soft- 
ware was then used to view the specific 3D model, aiding the spatial cognition of the 
students. 

The multiple systems and applications presented show that AR has the potential to 
replace traditional methods of design evaluation. When introducing the research, Park 
[11] discusses use of CAD modelling, giving pros and cons of the use within the de- 
sign process. Although it is noted that CAD is a key component for conceptualisation 
and product realisation, it is apparent that CAD has a “fundamental problem of intan- 
gibility”. It is thought that the use of AR applications within product development can 
be utilised to overcome these issues. 
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2.2 Collaborative Design 


It is not uncommon for design teams in the current design climate to be working in 
separate countries or even continents; synchronous and a-synchronous working has 
become a vital component in the design process and this must be facilitated with col- 
laborative design applications. Even if the situation arises where a design team are all 
working together in one place, it is likely that the group will consist of members from 
many backgrounds and disciplines. In order to facilitate these members, design tech- 
niques which easily represent a product concept or component for design evaluation 
must be utilised. AR technologies have been implemented extensively for collabora- 
tive design applications [12-16] allowing for representation, evaluation and modifica- 
tion of a design in a group environment. 

It has been recurrently observed that although CAD systems are a vital component 
in the current product development process, they lack the “natural feel” that is pro- 
vided with traditional methods of product realization. The result of this is a lack of 
tactile feedback provided to the user regarding their design. 

Collaboration with users, clients and other stakeholders throughout the design pro- 
cess is vital as it allows for the development of usable and useful products [17]. It 
allows for a “human-centric” approach within the design process creating solutions 
that are directly influenced by the user and other stakeholders. 

One of the main issues when designing products for clients is the fragmentation in 
the client-designer relationship. This can be related to the relationship between the 
designer and a senior manager or CEO of a company who may not be familiar with 
the design process. Schumann et al noted that “Nowadays the convincing presentation 
of new products is a lengthy and often very expensive task” [18]. This is due to the 
different experience levels of stakeholders which can make the communication of 
ideas very difficult for the designer. Wang explained that while the designer is work- 
ing at a conceptual level, they will tend to “interpret client needs and desires into 
artistic form” [19]. However, this can create issues as the client may be unfamiliar 
with the “language of design” at this very early stage of the process. 


3 Augmented Reality and Mobile Technology 


The main problem with using mobile devices for AR has been their computational 
power however recently this is being being largely ameliorated. Nee et al argued that 
“higher processing power and hardware, such as high resolution camera, touch screen 
and gyroscope etc. have already been embedded in these mobile devices” [20]. A 
number of relatively advanced mobile AR systems were released in 2013. These in- 
clude Aurasma [21] Metaio Junaio [22] and Layer — Augmented Reality [23]. These 
apps are readily available on modern smartphones and other mobile devices. It is 
therefore now evident that modern mobile devices are ideal for augmented reality 
applications. 

Therefore, the question we pose is - does the use of a mobile device to facilitate an 
augmented reality application, adds value to the concept design process? 
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4 Design of the Experiment 


The experiment involved two parts. Firstly, the participants were asked to undergo 
multiple scenarios using augmented reality to analyse concepts. A mobile application 
that allows custom AR environments to be created using a mobile device was utilised 
alongside two basic mock-ups made from simple materials. The type of augmented 
reality is video-see-through which can be implemented by modern mobile devices. 
CAD models will be projected onto the two basic physical mock-ups to imitate the 
viewing of models in real life. The user was able to hold and touch the mock-ups and 
through the mobile device, it will appear as if they are handling the CAD model. 
This is a form of passive haptic feedback. The user was asked to analyse the con-cepts 
against basic criteria. During this first part, an informal interview was taken during 
the experiment where questions relating to the topic and the experience were asked. 
Responses were noted and any common answers analysed to reach a conclusion. The 
second part of the experiment was a questionnaire. Within this, questions were asked 
that relate to the experimental technique’s usability, practicality and how it compared 
to other techniques that the user has experienced 


4.1 Software 


After analysis of various options software chosen was ‘Metaio’[24] an AR program in 
which custom computer generated models can be integrated into an environment cho- 
sen by the user. To do this, any 3d model can be imported into the Metaio Creator 
where the model’s dimensions and position can be altered. A target is then used for 
tracking and allows the chosen model to appear in the real world. Once this position 
of the model over the target is decided, it is fixed and the only way that the user can 
manipulate it is to handle the object that the tracker is attached to. This is important as 
it replicates the viewing of a model in real life. The program links directly to the Me- 
taio Cloud and every user can create a ‘channel’ that contains their custom augmented 
reality developments. These channels are held in the Cloud and can be viewed 
through mobile devices using the ‘Junaio’ application developed by ‘Metaio’ in which 
anyone can view your models using specific targets. The type of 3D model used was 
an OBJ file, an object file. 

The CAD models used were sourced from TurboSquid.com an online source for 
professional 3D models. OBJ files can be downloaded from this source which is well 
suited for the “Metaio’ software. The models chosen were similar to allow for more 
focussed evaluation similar to that of concepts created within the same project. Two 
mobile phones were used, a model of the Nokia N82 Mobile Phone and the Sony 
Ericsson W960i Mobile Phone. 

The aim was to make the prototypes very simple. They were created out of white 
foam card that was cut and shaped to the size of the CAD model. No detail was in- 
cluded in the mock-up as it was the CAD model that was intended to show detail. The 
participant was to understand that the simplest model could be created to then project 
over a CAD model as this would take minimal time in a design process. 
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Fig. 1. AR displays of hand-held mobile phone model 


4.2 Basic Scenarios 


Certain scenarios will be developed to replicate how a designer may examine a proto- 
type during the evaluation stage of the design process. Klinker et al (2002) pro-duced 
a set of scenarios that exemplify how a car designer would use their developed aug- 
mented reality system. These scenarios are different from that of a basic prototype 
analysis that may be completed for a consumer product. The set of scenarios have 
been developed from those used when testing augmented reality from car design. 


— 


. Handling — The designer views the product holding the prototype in one hand and 
the mobile device in the other. They will rotate the product as if they were analys- 
ing its form. 

. Overview — The designer will place the product on a surface and hold the device 
to view it. The position of the device will be changed to evaluate the prototype at 
various angles. 

. Detail viewing — The designer will view a specific detail of the model by han- 
dling the product and the mobile device. This could be a specific component or 
mate-rial within the model. 

4. Compare — The user will be asked to compare the model to another that they have 

not viewed yet. It will be recorded how the user chooses to view the other proto- 


type. 


N 


w 


During each of these scenarios, recordings of comments and visual impressions 
will be taken. The experiments will focus on the evaluation of concepts by a single 
designer. Each designer will be asked to undertake these scenarios to evaluate two 
given augmented reality concepts using a provided mobile device that has the AR app 
installed. 

The criteria used for evaluation were as follows: Quality - Which model appears to 
be of a higher quality? (build quality, material etc.) Robust - Which model appears to 
be more robust in that it can resist impact from dropping? Aesthetic Appeal — Which 
model is more aesthetically pleasing? Usability - Which model appears to be more 
user-friendly i.e. simple and easy to use? Purchasing — Which product would you 
purchase on first impressions? 
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4.3 Informal Questions 


The set questions for during the experiment are simply guidelines and are subject to 
change depending on the participant. These questions are as follows: How do you find 
the Augmented Reality System? Do you feel you can visualise the model clearly? 
Does the technique work as you had first imagined? How does this compare to other 
techniques for model viewing you have used? Is this a technique that you can see 
using in the future? If not, do you see it being used in the future when the technology 
advances? What are the advantages and hindrances, if any, brought with this 
technique? Participants 

The experiments included participants of varying age and experience in the design 
process. Participants were sourced from the Design, Manufacture and Engineering 
Management Department of the University of Strathclyde. Students in their fourth and 
fifth year were included in experimentation as they had accumulated reasonable expe- 
rience in the field. 


5 Experiment Results and Discussion 


In this section the statement provided to the participants is shown followed by the 
response in graphical and textual form. 

Statement - ‘The use of augmented reality to view and evaluate a model is more in- 
tuitive than when viewing a model within a 3D CAD program.’ 
From the Graph below it can be seen that these results are very conclusive as no par- 
ticipants stated that they disagree with the statement. This shows that the vast majori- 
ty of participants agree that augmented reality is a much more intuitive tool for view- 
ing concepts than viewing on a screen in a CAD program. However, three participants 
stated that they neither agree nor disagree and so the comments have been analysed to 
further investigate the comparison of techniques. 


20 
15 
10 
5 
0 
Strongly Disagree Neither Agree Strongly 
Disagree Agree nor Agree 
Disagree 


The majority of participant comments are pro Augmented Reality when compared 
to CAD but for a variety of reasons. Several participants noted the novelty of AR over 
CAD in that it is ‘fun and interactive’ and therefore would be beneficial to promote 
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concepts to others or to involve others in the evaluation process. Others appreciated 
the ability to fully control the model intuitively and they naturally like the control 
when handling a final product. One participant stated that customers ‘may look at a 
CAD model and think “very good, but how does that affect me’. When compared to 
CAD, the user found that they were able to ‘minutely adjust the view easily’ and that 
‘user adjustments become instinctive’. One participant noted that this may only be 
true of hand held products. A larger product may be more difficult to assess if it can- 
not be handled. It is suggested that for future work, a variety of models of different 
sizes are used to explore the application further. It is clear from these responses that 
the use of augmented reality adds value to the concept design stage of the product 
design process. 


Statement - ‘The use of augmented reality to view and evaluate a model is more intui- 
tive than viewing a model on an engineering drawing.’ 
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12 
10 
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It is evident that the technique is much more accessible to those who may not be 
familiar with the format of engineering drawings i.e. those with little manufacturing 
or product design background. One participant noted that ‘drawings give no sense of 
scale and struggle to convey emotive shades.’ Another stated that ‘2D shapes on en- 
gineering drawings provide little user feedback.’ This was a common thought during 
the experiments. There were several participants that saw benefits in both techniques. 
One participant noted that there is more detail on an engineering drawing as it pro- 
vides details on dimensions, materials, assembly, bill of materials etc. whereas all that 
can be seen in the augmented reality model is the outer aesthetics which are put in the 
context of the surrounding environment. For those who may require details for manu- 
facture or higher amounts of detail of the product, the use of augmented reality may 
not be beneficial. Participants agreed with this, one of which stated that augmented 
reality is more beneficial ‘in some aesthetic aspects though it lacks obvious informa- 
tion on construction, fit, materials, etc.’ 

One definite benefit would be in the collaborative design and evaluation of prod- 
ucts with clients, customers and those who may not be familiar with the design 
process. This is due to the overall intuitiveness of the AR technique which allows 
people to hold and view product as if it were there in front of them. This technique is 
natural much like viewing and handling a finished product. 
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Statement - ‘You would use augmented reality in future work for concept design.’ 


18 
16 
14 
12 
10 
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4 
2 
0 
Strongly Disagree Neither Agree Strongly 
Disagree Agree nor Agree 
Disagree 


It was found in these results, as well as discussions with the participant, that they 
would use this technique for a variety of applications in concept design. They could 
see real benefit in the use of AR in this context but could also see where the technolo- 
gy could be developed to increase the number of possible applications. 

The main application was that it would be used to present ideas to others who may 
not be familiar with CAD e.g. customers or clients. The reasons for this were that 
ideas could be shown to anyone at any time. All that is needed is a rough model and a 
mobile device e.g. a smartphone or tablet. In doing so, quick and early feedback could 
be gained from customers or clients but also, they could be involved in the develop- 
ment of the early concepts. This is due to the accessibility and intuitiveness of the 
technology. One participant mentioned that it would be beneficial to show to clients 
as it is similar to the empathetic modelling technique for early design. Also, multiple 
concepts could be shown to different user groups very early on without the need to 
create physical detailed prototypes. This would give an early insight into necessary 
design changes which would save time and money during the process. One participant 
also stated that it is a very cost effective method of ‘prototyping’ and that they would 
use this to develop designs quickly and efficiently. 

Multiple participants noted that one of the main benefits of the technology was the 
ability to see the product in the context of the real world. It is very intuitive when 
viewing the model as the user gets an instant impression of the scale and dimensions 
of the product which is unlike other techniques such as viewing a CAD model on a 
screen or on an engineering drawing. One participant stated that it is a ‘dynamic form 
of product evaluation’. Despite the real benefits of augmented reality in this context, it 
was noted that it may only be beneficial for hand held devices that users can interact 
with. With larger models, it may not be as beneficial as the use of a small screen to 
view a model of 1:1 scale may not be practical and the convenience of taking small 
rough hand held models to meetings and clients is lost. 
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Statement - AR technology is beneficial for presenting concepts to managers, 
CEO’s, customers, clients or anyone who may not be as familiar with CAD but it is 
necessary for them to view and understand the design. 
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Of all the questions within the questionnaire, this can be seen as the most conclu- 
sive in terms of results. With 29 of 30 participants selecting either agree or strongly 
agree, it is clear that there is a real benefit of using AR in this context. Many agreed 
that the benefits lay in the intuitiveness of the technology for those who may not be 
familiar with other techniques such as CAD or 2D drawings. One participant stated 
that ‘for those not used to CAD, it would provide an easy hands-on method of view- 
ing concepts.’ Many agreed that the ability to gain tactile feedback from a model aids 
the understanding of it in terms of design, feel, weight, how it is used etc. The results 
from this question showed that there are two main applications, the presentation of 
designs and the collaborative design of products with those unfamiliar with design 
processes. One participant noted the benefit of AR in the early concept design stages 
where prototypes may not be available or financially viable at the time. AR can pro- 
vide designers a tool for presenting models and a technique for evaluating them. One 
participant noted that AR is an ‘interesting and captivating form of presentation.’ 
Designers who want to push for an idea may wish to use this tool to enhance the 
design. 

From these results it is clear that AR adds value not only to the evaluation of de- 
sign, but the concept design stage as a whole. It allows for collaborative design as 
well as creating an exciting and appealing form of presenting. 


6 Conclusions 


The review of literature identified that there was no previous research on the applica- 
tion of mobile augmented reality in the concept design stage of product design. It is 
this identification of a gap in knowledge that formed the research focus. 

The empirical study involved the contribution of participant knowledge in the field 
of product design. The research concluded that augmented reality does add value to 
the concept design process. It does so by allowing for the collaborative evaluation 
between those with experience in product design i.e. designers, and those who may 
not be familiar with the processes i.e. customers, clients etc. It was found that the 
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system was highly intuitive and allowed for early evaluation of concepts without the 
need to build prototypes. This, in turn, saves time and money, further adding value for 
the designer. 

The implementation of augmented reality in this context will add great value. As 
the technology advances in augmented reality and the capability of mobile devices 
increases, the value can only increase in the oncoming years. 
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Authoring of Automatic Data Preparation and 
Scene Enrichment for Maritime Virtual Reality 
Applications 
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Fraunhofer Institute for Computer Graphics Research IGD, 18059 Rostock, Germany 


Abstract. When realizing virtual reality scenarios for the maritime sec- 
tor a key challenge is dealing with the huge amount of data. 

Adding interactive behaviour for providing a rich interactive experi- 
ence manually requires a lot of time and effort. Additionally, even though 
shipyards today often use PDM or PLM systems to manage and aggre- 
gate the data, the export to a visualisation format is not without prob- 
lems and often needs some post procession to take place. We present a 
framework, that combines the capabilities of processing large amounts of 
data for preparing virtual reality scenarios and enriching it with dynamic 
aspects like interactive door opening capabilities. An authoring interface 
allows orchestrating the data preparation chain by non-expert users to 
realise individual scenarios easily. 


1 Introduction 


At our institute we have done research within the field of virtual reality in mar- 
itime sector for more than ten years. During this time, we have identified two key 
factors that need to be addressed when developing virtual reality applications: 


1. The amount of data is huge and, when exported, often comes divided into 
a large number of files. For example a ship of middle complexity consists of 
one million individual parts, often split in tens of thousands of files. This 
data needs to be converted and optimised for visualisation. 

2. The enrichment of scenes with dynamic aspects, e.g. for more realistic design 
reviews or training scenarios, requires large numbers of objects to be handled 
in a similar way. For example for realistic lighting conditions, each lamp de- 
signed within CAD, must be assigned a light source within the visualisation. 
Manual processing is time-consuming and expensive, being a show-stopper 
for many VR applications in the maritime industry. 


We address those issues with an extensible data processing framework capable 
of processing 3D geometry and performing geometry specific operations like the 
calculation of bounding boxes. The framework supports the notion of modules 
performing the actual processing and offers a selection of predefined modules for 
basic operations. Additionally, an authoring interface is provided allowing the 
orchestration of the modules 
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2 Related Work 


Authoring of dynamic virtual reality scenarios has received increased attention 
during the last fifteen years. Specific authoring application exist for a number 
of application domains with a tight focus on the specific domain. Examples are 
the “High-Level Tool for Curators of 3D Virtual Visits” of Chitarro et al. [3] or 
the authoring approach for “mixed reality assembly instructor for hierarchical 
structures” of Zauner et al. as well as commercial software products like the 
Unity editor environment. 

Generic authoring approaches are much harder to design and often provide 
more generic building blocks. Kapadia et al. introduce an approach how to author 
behaviour in a simple behaviour description language [6]. To this end, various 
constraints and settings can be specified. The focus is laid on the authoring of 
the behaviour itself. 

Many approaches ground on the idea of object based behaviour PJ5J4]. Here 
an object consists of geometry and behaviour, i.e. both form a unit. When au- 
thoring virtual worlds based on those approaches, the workflow is usually to 
create at least the interactive objects of the world within an authoring environ- 
ment. The authoring environment is specific to the creation of behaviour ob- 
jects and provides often only limited geometric modelling capabilities. Backman 
suggests a slightly different approach [I]. His authoring framwork for virtual 
environments is also based on the notion of objects, where the link between 
physical behaviour and geometry is maintained through a link definition. For 
visual authoring however, he utilizes an existing 3D modelling tool, and the ob- 
ject properties responsible for the behaviour are defined as annotations to the 
3D-objects. 

A comprehensive approach on the authoring of “Compelling Scenarios in Vir- 
tual Reality” is presented by Springer et al. [10]. The authors describe a system, 
consisting of several stages. The system addresses automatic scenario generation 
by creating objects in a predefined way based on e.g. 2D-terrain images. Fur- 
ther, scenario editing is provided, supporting the creation of additional objects. 
The geometry is loaded from external files, while the object behaviour must be 
implemented in the presented application. Finally, an immersive viewer displays 
the scenario and can be coupled with the scenario editor supporting a live edit- 
ing of the scenarios. The idea of a tight coupling between scenario editor and 
immersive environment is also pursued by Lee et al. [7]. The authors describe 
an authoring system for VR-scenarios, that allows doing the authoring within 
VR. The approach is named “immersive authoring and testing” and, according 
to the authors, avoids frequent changes from desktop to VR system. 

A step further in automatic scenario generation is done by Zook et al. [12]. 
They propose the creation of training scenarios based on computer stored world 
knowledge, learning objectives and learners attributes. For a specific domain, 
the approach allows to generate a large amount of different training scenarios 
for training different objectives in various combinations. The approach requires 
a high initial effort to store the world knowledge and learning objectives in a 
computer processable way. 
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3 Data Processing and Enrichment Framework 


Current state of the art authoring approaches provide means to either create 
dynamic scenes from ground up or to manually enrich existing geometric models 
with behaviour. In the latter case, the geometry usually gets exported from a 
CAD system or a 3D modelling tool. This approach works well, when only a 
limited set of objects needs to be interactive and when the basic 3D geometry 
does not change over times. In 2006 we have presented a first approach for 
defining an automatic enrichment process for enriching geometry with behaviour 
to address this issue [8]. However, this approach was limited in a way, that it 
allowed addressing geometric objects solely by name and was closely tied to the 
VRML programming language. 

We present a framework, that allows applying specific behaviour to a large 
number of objects based on custom selection mechanism. The work presented 
in this paper is based on our previous work but features an open architecture 
and additionally incorporates flexible mechanisms for data processing. Apart 
from the addition of geometric behaviour the framework also supports the pre- 
processing of geometric data like data conversion or geometry cleanup. It can 
be used to define a fully automated data processing chain from CAD-model to 
interactive virtual reality scenario. 

The new architecture consists of a generic data processing platform which 
can basically handle any kind of data. The data processing flow can be defined 
using a graphical authoring environment, enabling non-IT experts to set up the 
data conversion chain for a VR session. The platform has a strong focus on 3D 
content and behaviour enrichment. 


3.1 Data Processing Framework 


In this section we will discuss the data processing framework in more detail. The 
basic components of the framework are: 


Modules perform the actual data processing, they receive data and operate on 
the data, usually transforming it in a way. Each module contains a set of 
typed in-slots for they data that gets processed, a set of typed out-slots for 
the data generated by the module and a set of attribute to configure various 
parameters of the module. 

Components are a special form of modules. They can also perform data pro- 
cessing and, additionally, can contain other modules or components, called 
inner modules. In addition to the normal in- and out-slots, they can also 
contain internal in- and out-slots. Internal slots are utilised to release data 
available to the component to its inner modules, and to receive data gener- 
ated by the inner modules to the surrounding component. 

Routes describe the data flow between modules and components, i.e. they con- 
nect the out-slots of one module with the in-slot of another module. 


Figure [I] illustrates the usage of modules and routes. The Creator module 
creates an X3D scene based on the X3D code specified in the attribute and 
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Creator - X3DNodeCreator Writer - X3DNodeWriter 


x3dCode : <X3D> <Scene><Shape><Sphere/></Sha...| 
x3dFile : 


Fig. 1. A simple data processing chain, the module Creator creates an X3D scene and 
the module Writer writes the scene to the hard drive 


sends the scene to its out slot. The Writer module receives the scene and writes 
it to an X8D file. 

An example for a component is given in Fig. 2] The ForEachComponent re- 
ceives a list of objects at its objectsIn slot. When executed, it creates instances 
of its inner modules for each element of the list, and sends the object to be 
processed through its inner out-slot listObject. In the example, a list of X38D 
objects is processed and a transform node is inserted around each of the nodes. 
The resulting objects are collected by the foreach component and released via 
its out-slot objectsOut once all objects have been processed. 


Processor - ForEachComponent 


o—>————=—-  - 
r i TransformCreator - X3DNodeCreator a: 


x3dCode : <Transform></Transform> 


x3dFile : 


Fig. 2. The component Processor receives the list of objects and for each of those 
objects runs its inner modules. The resulting objects are collected by the component 
and released via the out-slot objectsOut once all objects have been processed. 


The platform offers an authoring environment which allows to visually com- 
bine the data processing modules, define the attributes and connect the modules 
via routes. The illustrations of data processing modules throughout this paper 
have been exported from the authoring environment. With the predefined mod- 
ules provided by the framework common data processing tasks for preparation of 
geometry can be realised. This includes the iteration over a collection of objects, 
the selection of specific items and the processing of files with external tools like 
geometry converter or optimiser. 


3.2 Creating Custom Modules 


The open architecture of the data processing framework allows users to develop 
modules and components for their specific application domains. When develop- 
ing a new module or component, its interface must be described within an XML 
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definition and the behaviour must be implemented in the Java programming 
language. This includes the information about attributes, in- and out-slots sup- 
ported by new modules, its name and the Java-classes implementing the data 
processing. This task is suitable for IT-experts only. 

Once the custom modules have been developed, the end user can utilise them 
in the same way as the modules predefined by the framework. In fact, the pre- 
defined modules are in no way different from custom modules. For example the 
ForEach-component given in Fig. B]is just an ordinary component. The function- 
ality to iterate over the list of objects and send it to the in-slots is implemented 
within the component-implementation. A module-designer could decide to re- 
move the available ForEach-component and provide its own implementation. 
Figure [3] gives an overview over the roles involved. 


Framework Developer 


} < implemented in Java © 


Module Designer 


defined in XML, @ 
implemented in Java 


End User 


defined within authoring 
environment 


Fig. 3. The structure of the modules, and components as well as the type system is 
defined by the framework developer. The module designer can define custom modules 
for their specific application domain which can be used by the end user. 


3.3. Geometry Modification and Behaviour Enrichment 


The main focus of the platform is the preparation of geometry for interactive 
virtual reality scenarios. This is reflected by special support for the X3D lan- 
guage. In particular, a special type for X3D-data and a set of modules specific 
for processing X3D geometry is provided. 

The most important modules are presented hereafter. The X3DCreator mod- 
ule allows to read whole X3D files as well as fragment X38D code. The code can 
consist of any possible top level node and thus be used to create e.g. Touch- 
Sensors, Script nodes, or additional geometry as required for the realisation of 
arbitrary behaviour. The X3DWriter module can write out X3D code to a file. 
The X38DNodeSelector supports selection of specific nodes from a X38D scene. Se- 
lection can happen either based on the node names using wildcard expressions, 
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or using XPath expressions. Using XPath as selection criteria is very flexible 
and can e.g. be utilised to select nodes based on associated meta information, 
which is stored in separate Metadata nodes in X3D. Finally the Nodelnserter 
can insert one node inside or around another one. This can be required, e.g. if 
a Transform node must be added to allow the movement of an object or if a 
TouchSensor should be added to allow interaction with the object. Fig. [4]shows, 
how different modules are combined to add a Transform node around all doors 
within a scene. 


‘Writer - X3DHodeWriter 


x3dFile : outfile.x3d 


Processor - ForEachComponent 


Insert - Nodeinserter 


relation : around I 


Fig. 4. 


3.4 Maritime Enrichment Scenario 


We have used the platform to realise data processing for different maritime 
application scenario. This section describes the realisation of one such scenario. 

In this scenario we got a ship model consisting of approximately 20.000 geome- 
try files and more then 100.000 objects from the shipyard Flensburger Schiffbau 
Gesellschaft. To allow for an interactive walk-through we needed to add the 
functionality to open and close the doors within the model. We have set up 
the data processing chain shown in Fig. |5| The chain sets up a VR scene and 
automatically enriches it with the interactive behaviour. It first scans the con- 
tent of the directories where the geometry files are located and then creates and 
X3D-Group node containing and Inline-Node for each of the files. A filter then 
selects all nodes where the name matches the string 216* (indicating a door). 
The list of nodes is passed to the ForEachComponent, where one module calcu- 
lates the bounding box for each of the doors and a second module inserts the 
behaviour to open and close the doors. The route from the Inline module to 
the ForEach component is merely to make the grouping node available to the 
ForEach component. The resulting file is then written to the specified X3D file 
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Betector - x3DNodeSelect.. 


Dir - DirModule 
ir: 


ForEach - ForEachComponent 


maxParallelRuns : 


Writer - X3DNodeWriter 


x3dFile : ../de.fraunhofer. 


BBoxCreator - BBoxCreat... 


Fig. 5. Data processing chain to enrich a ship scene with the behaviour to allow opening 
and closing of doors 


Fig. 6. Walk through an enriched 3D scene, when the user approaches the door opens 
automatically 


by the X3DNode Writer. Figure[6]illustrates how a walk through looks like, when 
approaching a door it opens automatically. 

Since the model was to large to be displayed fluently as a whole, we have 
set up additional data processing steps to implement on demand loading of 
objects. When the users approaches the object the objects are shown only up toa 
certain distance and when the user moves away, the object are hidden again. The 
modules for the data processing consist of the BoundingBox creator module to 
calculate the bounding boxes of the individual objects and a OnDemandLoader 
module which inserts a ProxyimitySensor node for each object and shows or 
hides the object as appropriate. More information on this industry scenario can 
be found in our paper [9]. 

The implementation of the behaviour enrichment with our data processing 
framework has two major advantages. First, it allows to be executed multiple 
times, being applicable to new versions of the CAD geometry or, with usually 
only a few adjustments like the name pattern, to totally new ships. Second, it 
allows to apply the behaviour to large numbers of object in very little time. 
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4 Summary and Outlook 


In industry virtual reality scenarios, the geometry for the scenario is usually pro- 
vided by CAD or modelling tools. In this paper we have presented our approach 
for authoring the data preparation chain for geometry processing and behaviour 
enrichment. The behaviour enrichment of large amounts of objects is handled 
through a general object selection mechanism and an iteration mechanism allow- 
ing processing of homogeneous objects. The framework enables users to quickly 
set up interactive virtual reality scenarios. Once a processing chain is set up, it 
can be applied to updated versions of the geometry and to other base geometries 
with little effort. 

In future we plan to extend our approach in different directions. One direc- 
tion is, to allow summarizing components with integrated modules to high level 
building blocks, providing a higher level of abstraction to the user, while still sup- 
porting fine-grained data processing modules. Another direction is to ease the 
authoring process by increasing the graphical capabilities. Currently selecting 
objects from an X3D scene must be done manually, e.g. by name. Graphically 
presenting a loaded X3D-scene and allowing selection to happen through the 
visual representation would further ease the authoring process. 
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Abstract. This paper proposes an AR (augmented reality) based vehicular safe- 
ty information system that provides warning information allowing drivers to 
easily avoid obstacles without being visually distracted. The proposed system 
consists of four stages: fusion data based object tracking, collision threat as- 
sessment, AR-registration, and a warning display strategy. It is shown experi- 
mentally that the proposed system can predict the threat of a collision from a 
tracked forward obstacle even during the nighttime and under bad weather con- 
ditions. The system can provide safety information for avoiding collisions by 
projecting information directly into the driver’s field of view. The proposed 
system is expected to help drivers by conveniently providing safety information 
and allowing them to safely avoid forward obstacles. 


Keywords: AR (augmented reality), vehicular safety information, forward col- 
lision, warning system, data fusion, object tracking, threat assessment, warning 
strategy. 


1 Introduction 


To avoid collisions with stationary obstacles, other moving vehicles, or pedestrians, 
drivers have to be aware of the possibility of a collision and be ready to start braking 
early enough. In addition, when following other vehicles, drivers need to keep a safe 
distance to allow for proper braking. An understanding of how drivers maintain such 
a safe distance, the type of visual information they use, and what visual factors affect 
their performance is clearly important for improving road safety. A driver has to rely 
on direct visual information to know how rapidly they are closing in on a forward 
vehicle. Therefore, if this information is poor, there is a danger of the driver not suffi- 
ciently braking in time. In addition, a system for the rapid detection of neighboring 
objects such as vehicles and pedestrians, a quick estimation of the threat of an ob- 
stacle, and a convenient way to avoid predicted collisions is needed. Automobile 
manufactures are highly concerned about problems related to motor vehicle safety, 
and are making greater effort to solve them, for example, adaptive cruise control 
(ACC) [1], antilock brake systems (ABSs) [2, 3], collision-warning systems (CWSs) 
[4], and emergency automatic brakes (EABs). AR-based driving support systems 
(AR-DSSs) have also been recently developed [5, 6]. These developed AR-DSSs 
differ from traditional in-vehicle collision avoidance systems (CASs) in that 
they provide warning signals overlapping with real physical objects. Compared to a 
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traditional CAS, an AR-DSS attempts to support the direct perception of merging 
traffic rather than the generation of a warning signal. Therefore, an AR-DSS can low- 
er the switching costs associated with a traditional CAS by providing signals that 
align with a driver’s perceptual awareness [5]. Accordingly, an active visual-based 
safety information system for preventing collisions has become one of the major re- 
search topics in the field of safe driving. Thus, this paper proposes an AR-based vehi- 
cular safety information system that provides visual-based collision warning informa- 
tion to match the driver’s viewpoint. We expect that the proposed system will contri- 
bute significantly to a reduction in the number of driving accidents and their severity. 


Zz AR-Based Vehicular Safety Information System 


The proposed system consists of four stages: fusion data based object tracking, colli- 
sion threat assessment, AR-registration, and a warning display strategy. An I/O flow- 
chart of the proposed system is presented in Fig. 1. Once a driver starts driving, the 
system continuously detects and tracks forward objects and classifies the collision 
threat level. Simultaneously, the system tracks the driver’s eye movement and 
presents potential collision threats on a see-through display; the results are then 
matched with the driver’s visual viewpoint to help the driver identify and avoid ob- 
stacles. Unlike conventional ABS, the goal in this paper is to provide forward-object 
location based on the driver's viewpoint through an interactive AR-design for main- 
taining a safe distance from forward objects and preventing collisions. Thus, two 
modules, i.e., collision threat assessment and a warning display strategy, will be de- 
scribed in detail. 
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Fig. 1. The proposed system configuration 


2.1. Fusion Data Based Object Tracking 


To track forward objects accurately and robustly, the proposed system uses both video 
and radar information, which provide important clues regarding the ongoing traffic 
activity in the driver’s path. Fig. 2 shows how to track based on fusion data. 
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Fig. 2. Sensor Data Fusion for object tracking 
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In vision-based object tracking, all objects on a road can be deemed potential ob- 
stacles. In this system, we first extract all obstacles using their geometric properties. 
We also classify them into significant and less significant objects, which are triggered 
under certain circumstances. Significant objects are obtained using specialized detec- 
tors (i.e., vehicle and pedestrian detectors) [7, 8]. In an invisible environment, the 
proposed system detects multiple forward obstacles using three radar systems, and 
then recognizes and classifies them based on fusion with a night-vision camera 
through a processing shown in the flowchart in Fig. 2. 


2.2 Collision Threat Assessment 


The threat assessment measure of the proposed system is defined in Eq. (1). This 
measure is based on the basic assumption that a threatening object is in the same lane 
as the host vehicle, and is the closest object ahead. The proposed system estimates the 
collision possibility using the velocity and distance between the host vehicle and ob- 
stacle, which is referred to as TTC (time to collision) in this paper. To measure the 
TTC, an experimental DB is first generated, and the optimal threshold value is then 
extracted using this DB. 


TTC = =, (1) 
where D,», is defined as the distance between the host car and obstacle, and V, is the 
velocity of the host car in Eq. (1). 


Table 1. The compliled DB under various conditions 


Type Driving Condition The compiled DB 
Velocity (V) 60 km/h Aces includ; 
: cquired images including more 
Dist D 100 
Vehicle aseance 2) = h = than 500 vehicles, which were 
Road Type . Henle taken during an 18-hour period 
Velocity (V) =40 km/h ; ; ; . 
Distance (D) roe Acquired images including more 
Pedestrian C than 800 pedestrians, which were 
- Crossway : : 
taken d 12-h d 
Road Type bs Bestar en during a our perio 
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The proposed system divides a threat into three levels according to the TTC value. To 
measure the TTC value of each of the three levels, an experimental DB of various 
driving conditions is first generated, as shown in Table |. Next, the optimal threshold 
value is extracted using the DB, as shown in Table 2. In general, a TTC is defined as 
the time remaining until a collision between two vehicles that will occur if the colli- 
sion course and difference speeds are maintained [9]. A TTC has been one of the 
well-recognized safety indicators for traffic conflicts on highways [10-12]. However, 
the proposed system provides warning information for safety fitting the driver's view- 
point through an interactive AR-design, and is applied to public road environments 
for both vehicles and pedestrians. Therefore, the TTC values used by the proposed 
system are extracted through various experiments. 


Table 2. TTC threshold value of each of the three levels (m/s) 


Level ie Vehicle Pedestrian 

1 (Danger) 0.08 threshed «0.3 0.0% tkhrerhed «1.1 

2 (Warning) 0.38 threshod «0.7 lLl#thkrerhed «6.0 

3 (Attention) 0.7% threshod «5.0 6.08 threshod «10.0 


2.3. AR Registration 


For the registration, the calibration parameters are generated offline through an ex- 
pression of the relations among the three coordinates of the vehicle, driver, and dis- 
play. The system then detects and tracks the driver’s head and eyes in real time. The 
coordinates of the target objects transform into display coordinates matching the driv- 
er’s viewpoint. A flowchart of this AR registration module is shown in Fig. 3. 
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Fig. 3. AR registration module 


2.4 Warning Display Strategy 


To improve the driver’s cognition of the displayed information, an interactive UX de- 
sign is needed. For this, the information provided should not only be easier to under- 
stand, but also more intuitively acceptable by considering the driver's characteristics, 
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the type of information provided, and the driving conditions. The system expresses 
information differently depending on both the threat level in the previous module and 
the study results from [13]. Table 3 shows the AR-display design and a representative 
scene on the see-through display according to the three levels of obstacle type. In the 
AR-display design, the color and line thickness are set based on the ISO rules [14]. In 
addition, the design type was determined through the study results of [13] and the HUD 
concept design in [15]. 


Table 3. AR-display design for three levels of obstacle type 


Level Real Displ 
Obstacle Type | Design Type ; - ; eras ay 
Vehicle 3 
a 
bases) | ecees) 

Type 1 STOP | | a 

Ea Ld 

Pedestrian Type 2 STOP Ge 


3 Experiment Results 


To provide driving-safety information using the proposed AR-HUD, various sensors 
and devices were attached to the experimental test vehicle, as shown in Fig. 4. The 
two cameras used for the forward obstacle recognition are GS2-FW-14S5 models 
from Point Grey Research Co., which are 12 mm cameras with a resolution of 1384 x 
1036, and can obtain an image at a speed of 30 fps. In addition, we used IEEE 1394b 
for the interface. To cover multi-target tracking, two SRRs (short range radar) and one 
LRR (long range radar) are used in environments with poor visibility such as under 
rainy conditions and at night. Both radar models are a Delphi ESR at 77GHz with a 
CAN interface. The IR-camera is a PathfindIR model from FLIR Co., and has a reso- 
lution of 320 x 240 with a speed of 30 fps using an RCA interface. 
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Fig. 4. Experimental test vehicle 


To show our AR-based vehicle safety information system, we used a 22-inch, transpa- 
rent Samsung LCD display with a transparency of 20%. This LCD display has low 
transparency, and thus cannot allow AR-based vehicle safety information to be seen 
very well at nighttime. To solve this problem, it is necessary to develop a large-area 
transparent OLED based display. Fig. 5 shows images of pedestrians detected by the 
proposed system based on the estimated optimal TTC value shown on the display. 


(A) stopping in a crosswalk (B) jaywalkers crossing a (C) jaywalkers crossing a 


public road residential street 


Fig. 5. Experiment results 


To evaluate each module, the experimental DB was generated from various driving 
environments, including a simulated road environment and actual roads (a highway, 
public roads, and residential streets). For vehicle recognition in the daytime, a total of 
10,320 frames were obtained from the experimental stereo camera. For pedestrian 
recognition, a total of 3,270 frames were acquired. Furthermore, a total of 5,400 
frames were obtained from the IR-camera for recognition of both vehicles and pede- 
strians during the nighttime. Fig. 6 shows the real road test region. As indicated in 
Fig. 6, the test region includes public roads, residential streets, and crossways for 
recognition of both vehicles and pedestrians in the daytime. In contrast, Fig. 7 shows 
the test-bed used for obstacle recognition during the nighttime. 
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Fig. 6. Experiment test region on real roads 


Fig. 7. Experiment test-bed for nighttime recognition 


The recognition rate of the driving-safety information obtained by the proposed sys- 
tem during the daytime is 85.01%, and the system has a recognition speed of 15 fps 
for both vehicles and pedestrians. The recognition rate of the driving-safety informa- 
tion and recognition speed of the proposed system during the nighttime are 77% and 
10 fps for both vehicles and pedestrians. 


4 Conclusions 


This paper proposed an AR-based vehicular safety information system for forward 
collision warning. This paper showed that 1) a forward obstacle can be successfully 
detected and tracked by fusing radar and two types of vision data, 2) fusion based 
forward obstacle tracking is robust compared to single sensor based obstacle detec- 
tion, and objects can be reliably be detected, 3) collision threat assessments can be 
efficiently classified into threat levels by measuring the collision possibility of each 
obstacle, 4) AR-registration can provide warning information without visual distrac- 
tion by matching the driver’s viewpoint, and 5) a warning strategy can conveniently 
provide safety information considering both the obstacle and human-vision attributes. 
The experiment results show that the proposed system achieves an 81.01% recogni- 
tion rate. We expect that the proposed system will provide suitable information ac- 
cording to the driver's viewpoint as a way to reduce traffic accidents. 
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Abstract. The paper proposes a framework for supporting maintenance services 
in industrial environments through the use of a mobile device and Augmented 
Reality (AR) technologies. 3D visual instructions about the task to carry out are 
represented in the real world by means of AR and they are visible through the 
mobile device. In addition to the solutions proposed so far, the framework in- 
troduces the possibility to monitor the operator’s work from a remote location. 
The mobile device stores information for each maintenance step that has been 
completed and it makes them available on a remote database. Supervisors can 
consequently check the maintenance activity from a remote PC at any time. The 
paper presents also a prototype system, developed according to the framework, 
and an initial case study in the field of food industry. 


Keywords: Augmented Reality, Framework, Maintenance tasks, Remote Su- 
pervision. 


1 Introduction 


Maintenance operations in a factory are necessary duties in order to provide a conti- 
nuous functioning of the machineries and of the production. In several cases, opera- 
tors are trained in order to acquire skills necessary to intervene on the machines on a 
scheduled time and to operate by following proper procedures. However, the outcome 
and the time planned to achieve these maintenance operations are always uncertain. 
The uncertainties are due to difficulties that the operator must face to complete the 
maintenance task, such as the functional and mechanical complexity of the machine. 

The level of uncertainty increases when the maintenance operation is not a routine 
work because it is a fortuitous or compelling event that the operator is not used to 
carry out. In other cases, instead, an operator carries out a maintenance task even 
though his background is not sufficient to accomplish it autonomously or accurately, 
such as when the operator gets confused because he deals with several similar ma- 
chines or when an unskilled operator performs the task. This case usually happens to 
avoid the intervention of an expert operator, which could be costly and require a long 
waiting. Thus, an instruction manual traditionally supports the operator to accomplish 
the maintenance activity. 

However, maintenance operations accomplished with lack of depth or without 
complying with the protocols could lead to functioning problems of the machine. 
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A malfunctioning can be dangerous to the people working in the factory or it can lead 
up to additional maintenance, due to unexpected machine fails. From these considera- 
tions it turns out that the complexity of the machine, inexperience, negligence and the 
human predisposition to errors affect the maintenance effectiveness. Consequently, 
these issues negatively influence a machine, by affecting its production in a plant, its 
working life and, in a long-term perspective, it leads to the increase of the industrial 
costs. 

According to the above-mentioned considerations, current research trends are 
oriented to reduce maintenance costs by improving the operator’s performances at 
work. In particular, one of the main trends aims at reducing time and money for the 
training, by providing supports for instructions that are more accessible and easy to 
understand also for unskilled operator. 

A great advantage in the field of maintenance is offered by Augmented Reality 
(AR), which is an emerging technology coming from Computer Science. AR enables 
the user to see and interact with virtual contents seamlessly integrated in the real envi- 
ronment [1, 2]. In case of maintenance operations, the virtual contents are the instruc- 
tions to perform, which can be represented as text or as three-dimensional objects. 
Hence, the instructions are provided in a way that it is more direct, accessible and 
easy to understand than the approach based on a traditional paper manual. 

In this research work, the authors describe a framework that aims at extending the 
AR solutions for supporting maintenance tasks so far proposed. The framework com- 
bines a method to provide maintenance instructions to the operator by means of a 
mobile device and a solution to record and monitor the performed tasks in a remote 
location. The mobile device shows the instructions to the operator by using AR and, 
at the same time, sends data and pictures regarding the on-going maintenance task to a 
remote PC through a wireless network. The advantage of this framework is twofold. 
Operators have an intuitive support to achieve maintenance at their disposal, while 
supervisors can visualize the maintenance history of a machine and check the opera- 
tors’ work from remote. In particular, the remote PC can be used to evaluate if the 
tasks have been carried out in accordance with the protocols, if the operator did a 
mistake and if the maintenance has been accomplished on schedule. 

The paper is divided as follows. The most relevant research works carried out in 
the field of AR for maintenance are reported in Section 2. Then, the developed 
framework is described in Section 3, while an initial case study is presented in Section 
4. The paper ends with a discussion and an outlook on future developments. 


2 Background 


AR technology has been successfully experimented in the field of maintenance [3] 
and nowadays first industrial cases and applications are coming out [4]. The advan- 
tage of applying AR in this field is the reduction of the operator’s abstraction process 
to understand the instructions. In fact, the instructions are represented by means of 
virtual objects directly within the real world so that paper manuals are no longer re- 
quired. Comparative tests demonstrated the improvement of operator’s work in 
some manual activities by using AR in comparison with other supports to provide 
instructions. Tang et Al. demonstrated how AR reduces the user errors during manual 
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operations and the mental workload to understand a given task [5]. Henderson and 
Feiner showed that AR reduces the time to understand, localize and focus on a task 
during a maintenance phase [6]. In summary, AR increases the effectiveness of the 
operator activity and it consequently speeds up the whole workflow. 

Many AR applications conceived for conducting maintenance operations are based 
on immersive visualization devices, as for instance the Head Mounted Displays 
(HMD). The first research focused on maintenance using AR was carried out within 
the context of the KARMA project [7], in which they provided maintenance instruc- 
tions on a laser printer through an HMD tracked by an ultrasonic system. A case study 
in the automotive domain has been described in [8] for the doorlock assembly into a 
car door, while an immersive AR support is proposed for a military vehicle in [9]. 
Lastly, the immersive AR solution presented in [10] enables the operator to manipu- 
late maintenance information by means of an on-site authoring interface. In this way, 
it is possible to record and share knowledge and experience on equipment mainten- 
ance with other operators and technicians. 

However, HMDs have ergonomic and economic issues that impede their wide dep- 
loyment in industry, even though they are an effective means to give AR instructions. 
It is a relatively expensive technology that does not provide a good compromise be- 
tween graphic quality and comfort for the user [11]. Moreover, its use is unsuitable 
for a long period, as for an entire working day [12]. 

Mobile devices are currently the most interesting support for AR applications. Bil- 
linghurst et Al. evaluated the use of mobiles as AR support for assembly purposes 
[13]. Klinger et Al. created a versatile Mobile AR solution for maintenance in various 
scenarios and they tested it in a nuclear power plant [14]. Also Ishii et Al. tackled 
maintenance and inspection tasks in this very delicate environment in [15]. As nega- 
tive aspect, mobile AR has the disadvantage of reducing the manual ability of the 
operator during its use, if compared with the HMD case. In the first case, in fact, he 
has to hold the device. For this reason, Goose et Al. proposed the use of vocal com- 
mands in order to obtain location-based AR information during the maintenance of a 
plant [16]. Nevertheless, from an industrial point of view, mobile devices are current- 
ly the most attractive and promising solution for supporting maintenance tasks by 
means of AR. They are powerful enough to provide augmentation and they are cheap 
and highly available on the market, due to their high volume production for the mass 
market. In addition, since these devices are easy to handle, to carry and they are cur- 
rently present in the everyday life, they are considered more socially acceptable than 
the HMDs. 

This work aims at extending the use of AR in maintenance by monitoring the opera- 
tor’s activity from a remote location. Some research works partially dealt with this idea 
by integrating AR in tele-assistance. Boulanger demonstrated it by developing an im- 
mersive system for collaborative tele-training on how to repair an ATM machine [17]. 
Reitmayr et Al., instead, integrated a simultaneous localization and mapping system 
(SLAM) in an online solution of annotations in unknown environment [18]. The inte- 
gration of mobile devices into maintenance activities increases the effectiveness 
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of remote assistance by experts because it makes as if the expert was collaborating 
on-site. 

The research described in this paper distinguishes itself from the others because it 
presents a framework to record the work done for future monitoring. Actually, the 
cited works focus only on the support and supervision of the operator in real-time and 
they do not allow recording his work in order to check it afterwards. By the time this 
paper has been written, only Fite-Georgel et Al. have proposed a research with a simi- 
lar approach to monitor the accomplished work [19]. Their solution is a system to 
check undocumented discrepancies between the designed model of a plant and the 
final object. However, it works only offline and it has not been conceived for main- 
tenance purposes. 


3 Framework Description 


The developed framework enables the visualization of maintenance instructions 
through a mobile device and the remote monitoring of the accomplished work. In this 
section, an overview of the framework is firstly depicted and subsequently the two 
main modules, of which the system is made up, are described in detail. 


3.1 Overview 


Figure 1 provides a schematic representation of the framework and shows its two 
modules. The first one is the Maintenance Module and it is based on an AR solution 
to display instructions to accomplish on a mobile device. Thus, the instructions, which 
are traditionally provided by a paper manual, are stored in a database as digital infor- 
mation and loaded automatically by the AR solution when they are required. For each 
maintenance step, the module saves data about how the maintenance operation is 
going and, if the device is connected to a Wi-Fi network, it sends them to a remote 
storage server. 

The data stored into the server are visible at any time by means of the Monitoring 
module. In this way, a supervisor can check the entire maintenance history carried out 
on a specific machine. 


Laue eon ep ARinstructions 
Traditional on tablet 
Instruction 
Manual 


Instructions 
in Digital 
Format 


Maintenance Monitoring 
Module Module 


Fig. 1. Schematic representation of the framework 
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3.2. Maintenance 


The Maintenance module is basically an application that provides a mobile AR visua- 
lization of the instructions. Besides this, a Wi-Fi client is integrated in the application 
and it sends the data about the accomplished step to the remote server. 

Figure 2 shows the tasks of Maintenance Module. The camera, embedded in the 
mobile device, frames the machine that requires maintenance service and provides 
video stream to the AR application. Specific algorithms estimate the position of the 
camera with respect to the mechanical component by the video stream. This task, also 
referred to as tracking, allows the module to represent precisely the virtual contents in 
the real world, with a proper perspective and spatial coherency. 

The instructions of the tasks to carry out are stored in configurations files and they 
are loaded during the initialization of the AR application. They are textual informa- 
tion about how to perform the task and the spatial position of the machine compo- 
nents on which the operator should intervene. The instructions are rendered in a 
graphic manner, by using also the tracking data. The graphic result is superimposed 
onto the video stream and shown through the video display of the device. 

Once a step is finished, the module automatically saves the maintenance informa- 
tion and makes them available to the remote PC through the Wi-Fi client. 


Data Communication. Every time the operator presses the button to move to the next 
maintenance task, the application saves the data of the last operation concluded and 
sends them to the remote server. This approach, if scaled with several AR mobile 
maintenance devices, is a cloud-computing network, which is referred to as cloud in 
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‘ . Visualization Client 
Estimation 


| Rendering Client 


Tracking 


data to 
AR Visualization ~ cloud 


Fig. 2. The tasks that the Maintenance Module performs in order to provide an AR visualiza- 
tion and the data to the remote database 
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this work. These data are two pictures of the machine at the end of the task and addi- 
tional textual information. The pictures are the same one and they distinguished be- 
tween each other because one shows also the augmented content. In this way, the 
supervisor can check the correctness of the operation by comparing it also by the AR 
instructions. 
The other pieces of information are complementary data to complete the descrip- 

tion of the operation and they are the following: 

e  Operator’s name 
Data 
Time 
Machine ID number 
Typology of maintenance (ordinary, extraordinary, intervention for break- 
down/error) 

e Name of the maintenance operation 

e Step Number 

e Step description 

e Time taken to execute the task 

These data are saved in a file and organized according to a simple XML-like struc- 

ture so as to have an effective communication protocol to exchange information be- 
tween Maintenance and Monitoring Modules. 


3.3. Monitoring 


Monitoring Module is constituted by a software application that allows the supervisor 
to check the maintenance data stored in the Cloud. A parser retrieves the maintenance 
data saved in the XML files and makes them available to the module. Then, a GUI 
collects all the data and enables the supervisor to visualize and navigate through the 
pictures and the maintenance information of each task. 


4 Case Study 


The case study, which will be presented in this section, describes an initial test of the 
framework within the context of food industry. In particular, the machine used for the 
study is addressed to food packaging and it requires particular attentions and a period- 
ic maintenance service in order to provide a safe packaging process of the product. 

Several maintenance operations on this kind of machine must be carried out daily, 
due to hygienic reasons. Food Companies usually involves normal operators without 
any particular skills or knowledge about the machine to take care of it. The reason lies 
in the necessity to avoid the constant need of a skilled operator, but it involves a high- 
er risk of uncertainty on the outcome of the operation. 

The system developed according to the framework is described in the following. 
Then, the case study is presented. 
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4.1 System Description 


The system used for the case study is here described according to the two modules 
of the framework. This section takes into account both hardware and software 
components. 


Maintenance. The Maintenance Module used by the operator is an AR application, 
constituted by a GUI designed for maintenance purposes, which runs on a mobile 
device. The mobile device used for this case study is a Windows-based tablet PC. The 
tablet is equipped with a 1.80 GHz processor, 2 GB RAM, 10.1in color touch screen 
display and a 640x480 camera working at 30Hz. 

Once the operator has selected the right maintenance service to perform on the ma- 
chine through the GUI, the application provides him with the AR instructions. All the 
tasks have been taken directly from the paper manual, while the machine components, 
which are visible as augmented contents in the scene, have been exported from the 
CAD model of the machine. Each set of instruction for a specific maintenance service 
is saved in a separated file. 

A very stable marker-based tracking solution has been chosen to detect the camera 
pose and subsequently to properly represent the virtual content in the real 
environment. Thus, tracking is performed by placing squared, black and white 
markers on the machine. The tracking algorithms are from the library called 
ARToolkit Plus [20]. These algorithms detect the markers placed in the environment 
and they retrieve the camera pose by means of mathematical considerations on the 
four corners of each marker in the scene. 

The visualization of the AR contents in the real world is a merging process be- 
tween the video stream of the camera on the tablet and the virtual objects, which are 
rendered with the right perspective according to the tracking data. OpenSceneGraph' 
is the Computer Graphics library used for this purpose and it updates the visualization 
at every new camera frame. 

The interface has been specifically designed for the AR use on a mobile device. 
Thus, some considerations regarding how to represent and manage the AR content for 
the operator in the best manner have been taken into account. Actually, the instruc- 
tions to execute a task have to be provided by the system in a way that is simple to 
understand and interact. The guidelines presented in [21] have been used as starting 
point. Figure 3 shows the achieved result of the following considerations. 

The first consideration regards the visibility of the virtual objects in the working 
environment. The objects are 3D or 2D elements and each of them has a precise pur- 
pose, such as indicating the point on which the user has to work or showing the action 
to perform. For this reason, animations applied to them in order to show how to deal 
with a component can increase the understanding of the user. In addition, these virtual 
elements must be easy to be recognized into the scene by the operator. 


! OpenSceneGraph library: http: //www. openscenegraph.org/ 
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Turn the main switch to 


Turn off 


Fig. 3. The virtual objects present in the augmented visualization 


Second, the instruction has to be also complemented by some textual information. 
The purpose of the text is to give additional information, which cannot be provided by 
other graphical representations. The text is represented as a 2D element on the Graph- 
ic User Interface (GUI) and also as a 3D element, which is in a fixed position on the 
machine. The text should be short, clear and direct, so as to enlighten the operator 
about the task to accomplish. 

Finally, a way to manage the instructions has to be taken into account, since they 
are represented as a sequence of tasks. By means of a step-by-step instruction ap- 
proach, the user focuses only on one operation at a time. Therefore, two virtual 
buttons are present on the GUI and they enable the operator to switch from one in- 
struction to the next one. 
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Fig. 4. The Graphic User Interface for the remote monitoring 
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Monitoring. The Monitoring Module is a Windows application that runs on a desktop 
PC. As shown in Figure 4, the application provides a GUI to visualize the pictures, 
which were automatically taken by the Maintenance Module, and to show all the 
complementary maintenance data in a textbox. Finally, a button enables the supervi- 
sor to switch from the normal to the AR visualization of the picture. 


Fig. 5. Maintenance of the machine. The user frames the machine with the tablet (left) and 
looks at the instruction by means of the tablet (center and right). 


4.2. Maintenance Service on a Machine 


The case study is a simulation of maintenance service on a machine from food indus- 
try. For safety reason, the machine was not installed and operative; all the required 
utilities (water and electricity) were not plugged. The study regards an ordinary and 
extra-ordinary maintenance service. As depicted in Figure 5, the user was asked to 
select the proper operation through the GUI on the tablet and to accomplish it by fol- 
lowing the AR instructions. The user involved did not have any experience with the 
machine. 

Figure 6 shows some screenshots during the maintenance simulation. The markers 
placed on the machine allow the user to move in the environment and to experience 


Fig. 6. On the left, the remote user checks the work of the operator. On the right, a screenshot 
of the interface. 
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the AR visualization from both a close and a far distance. At the same time, a supervi- 
sor, who knows the maintenance tasks of the machine, sits in front of the remote PC 
and visualizes the user’s work. This interface turns out to be a very effective means to 
check the work and find possible mistakes. 


5 Discussion 


The case study shows the benefit of the framework to support and monitor mainten- 
ance operations. Actually, AR allows avoiding the use of traditional instruction 
manual and the training of operator on how to perform a correct maintenance. The 
visualization of the instructions contextualized on the machine makes the operator’s 
work easy, potentially faster and more precise than the one carried out with traditional 
means. Thus, experts are less required. In addition, the use of AR in maintenance 
tasks introduces the mobile devices in industrial fields, which can be used also for 
other purposes. In this case, the mobile device has been used to send information to a 
remote PC about how the maintenance process is going. Therefore, a company can 
increase its control on the operator’s work and collect the maintenance history of its 
machines in the factory. For these reasons, the proposed framework meets the expec- 
tations about the increase of the effectiveness of maintenance tasks through AR and 
remote supervision. 

The only drawback noticed during the case study is related to the tracking technol- 
ogy used in the AR application. Markers allow having a very stable and precise 
tracking, but they have to be placed onto the machine and its components. Thus, a 
time-consuming procedure for fixing and calibrating the markers is required. In addi- 
tion, the camera must always frame at least one marker to estimate the camera pose. 
The use of different camera tracking methods, which are not based on markers, can be 
used in order to overcome the problem. Currently, some methods are able to estimate 
the pose by means of distinguishable geometrical features that are already present on 
the mechanical component and the environment without placing any marker. These 
methods are usually called marker-less or natural features tracking. Examples of the 
use of these tracking technologies can be found in the automotive field [22] and in the 
aeronautic maintenance [23]. 


6 Conclusion 


This paper presents a framework for supporting maintenance tasks in industrial envi- 
ronments. The framework provides a method to represent instructions through mobile 
AR that eases the operator’s work. In addition, the framework introduces a new way 
to monitor the worker. The mobile device, which provides the AR instructions, 
records information about the performed maintenance steps and it makes them availa- 
ble on a remote PC. Thus, a supervisor can check the maintenance activity from the 
remote PC. 

In the future, new technologies will be integrated in order to provide more ad- 
vanced tracking and interaction systems. Moreover, the power offered by the cloud 
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computing will be investigated by making the data exchange between the mobile 
device and the remote PC in both ways. Therefore, the supervisor will be able to col- 
laborate with the operator by sending notes or alerts that will be automatically visible 
by the operator in the augmented environment. 
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Abstract. Virtual reality is a key technology for the designing of products 
through complex human-product interactions. This paper deals with the devel- 
opment of a product design method for complex human-product interactions, 
using the virtual reality (VR) technology. This VR method uses the graph 
theory in order for the complexity of the designed product to be measured on 
the basis of human task analysis. The latter is for the purpose of recording and 
analyzing the human-product interactions within an immersive simulation ses- 
sion. The proposed method undergoes tests in a realistic aerospace case. 


Keywords: Product Design, Product Complexity, Immersive Environment, Vir- 
tual Prototyping. 


1 Introduction 


Modern design versions of traditional products (e.g. aircrafts) have become more and 
more complex due to the constantly growing demand for regulations and standards, 
imposed by the globalized nature of their associated markets. Most of the products 
that are being manufactured today have some kind of interaction with humans. It can 
be considered that the human is the end-user of the product (e.g. airline passenger), or 
the operator of the product (e.g. aircraft pilot), or the worker involved in its manufac- 
turing (e.g. human worker in aircraft assembly line), or the technician/engineer 
concerned with the maintenance of the product (e.g. aircraft maintenance tasks). All 
different aspects of human-product interaction define a vast number of factors that 
need to be taken into account during product design. Furthermore, some products go 
through a heavy “automatization” (e.g. commercial aircraft) that further increases the 
complexity of human-product interaction. Virtual reality is a key enabling technology 
for designing products with complex human-product interactions. The study presented 
in this paper aims at developing a product design method for complex human-product 
interactions through the virtual reality (VR) technology. The latter enables the simula- 
tion of human factors during product design, in their full context, with high flexibility 
and reusability [1], [2]. Furthermore, it provides high levels of flexibility and cost 
efficiency during the early phases of product design. Since VR enables the simulation 
of human tasks in full context, it can provide the ideal platform for the measuring of 
product complexity by analyzing the human tasks, during the usage of products. 
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Product design with the CAD systems available, offers a perception of a 3D mod- 
el’s parameters such as shape, color, kinematics etc., nevertheless, the need for real 
time human interaction is not satisfied. The VR _ technology allows engi- 
neers/designers to interact, to a great extent, with the 3D model in an immersive envi- 
ronment and enables the testing, experimentation and evaluation of the product in full 
context. This technology can be considered as an extension to the conventional CAD 
tools by means of further extending the human integration with the product in its en- 
vironment. Therefore, the VR technology offers a great added value for use in the 
early design phases of complex products by means of testing and simulating them. 
However, a question that arises is whether or not besides testing and simulating a 
product in a virtual environment, VR can also measure its complexity and provide a 
useable metric that could support the engineers and designers to improve their design. 
According to [3], a good design is the one that satisfies all functional requirements 
with a minimum number of components and relations. In addition, a simple design is 
preferable to a complex design [4]. Therefore, there is a need that this complexity be 
minimized during design. A collection of different views has been made to increase 
the value of perception over the definition of complexity. In product design, from an 
assembly aspect, the predominant definition of complexity is the interconnection of 
parts. [5]. The information aspect of complexity suggests that complexity is a measure 
of the minimum amount of information required to describe the given representation 
[3, 6]. Complexity could also be stated as a measure of entropy randomness in a de- 
sign [7] and as a measure of the number of basic operations, required for the solution 
of a problem [8]. A more generic perspective is that complexity can be defined as an 
intersection between elements and attributes that complicates the object in general [7]. 
In [9], a complex system is defined as that comprising a large number of parts interre- 
lating in a non-simple manner. Approaches to reducing complexity can also be found 
in the literature out of methodologies for the reduction of assembly complexity [5] to 
approaches leading to product simplification [10]. 

Complexity measures could be categorized on the basis of what is evaluated, the 
basis of the measure, the method, as well as the type of measure. Considering the 
existing complexity measures, the most common types are size, coupling and solva- 
bility complexity [11]. Size complexity measures focus on several product elements, 
including the number of design variables, functional requirements, constraints applied 
and subassemblies. Size complexity measures are usually developed based on the 
information that primarily derives from entropic measures of a representation. . The 
complexity of a design could be measured as the cluster of reduced entropy at each 
step of the design process, thus a more complex design requires more reduction in 
entropy. Coupling complexity measures refer to the strength of interconnection 
among the elements of a design product, problem or process. The representation of 
the elements measured, needs to be in graph format. Coupling complexity, in most 
cases, is measured by the decomposability of every graph’s representation. Finally, 
solvability complexity measures indicate whether the product design may be pre- 
dicted to satisfy the design problem. It is also referred as the difficulty of the design 
process to result in the final design. Measuring the difficulty, could be stated as the 
time required for the designing of a product or the number of steps to be followed for 
its completion. 
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In [11], a comparison of the complexity measures is presented based on the exist- 
ing literature. The main variables for this comparison are the focus of complexity 
evaluation (i.e. design process, design problem, design product), the basis (computa- 
tional/algorithmic analysis, information based, and traditional design), the focus of 
measurement (size, coupling and solvability), the interpretation (objective, subjective) 
and finally, whether an absolute or relative metric has been used. 

This paper presents a VR method, developed for complex product design that 
records and analyzes the human-product interactions within an immersive simulation 
session and evaluates the product’s coupling complexity. The VR framework is 
based on graph theory methods for the measuring of a product’s coupling complexity. 
The latter is generated automatically, whilst the function structure and bipartite graphs 
of human-product interactions are analyzed. 


2 Complexity Evaluation Method 


The coupling complexity measure of a product could be defined as the measurement 
of interconnections between a product’s variables at any level. The coupling measure 
chosen to be used has been thoroughly described by [12]. The process requires that 
the design be represented in a graph format, where the tasks are depicted with nodes 
and are connected with simple lines in order to form dependencies. The method tries 
to decompose the product’s graph representation and thus, the working principle is 
that any relationships be removed until the graph could be separated into other graph 
formats in order for the coupling in each of them to be measured. The algorithm (see 
Fig. 1) aims to decompose the graph to the utmost extent. The graph is being decom- 
posed every time by questioning its connectivity feature. The algorithm for the graph 
analysis begins with the removal of unary relations and continues with the recording 
of the remaining variables. After this point, the algorithm keeps applying to all the 
sub graphs produced from the initial one. The arithmetic record is being kept so as for 
the interconnectedness of the graph to be measured and finally, conclude to an arith- 
metic value of the product’s coupling complexity. 

The current study aims at applying this algorithm inside a virtual environment. The 
calculation is made on the basis of graphs generated by the interactions performed 
with the product inside the virtual environment by the human user. The representation 
method chosen in this case is the function structure graph that seems to be most ap- 
propriate for the engineering systems. The function structure graph is a block based 
diagram, used for the analysis of engineered systems by representing the relations 
among the different functions of a product (see Fig. 2). The relations to be created for 
the representation of a problem are described by three basic types, namely Function- 
Function (F-F), Input-Function (I-F), Function-Output (F-O). These are also referred 
to as primitive relations (operators) and the building blocks of the graph primitive 
modules (operands). Complexity is usually correlated with the type of representation. 
The coupling complexity in a function structure graph is visualized by the intercon- 
nectedness of functions in a product. 
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Eliminate unary relations 
Level = 1, Total = 0 
FOR each (sub)-graph: 
Size = 1 
FOR all combinations of relations in Size 
Remove Size relations 
Check for separation 
IF Separation = TRUE 
THEN Mark relation as removed 
IF no relation removed 
THEN Size = Size +1 AND Go to step 3b 
FOR all relations sets marked 
Find combination of Sets > Remove “MAX(relations)” AND “Duplicate = FALSE” 
Total = Total + Level*Size*Sets 
Level = Level +1 


Fig. 1. Pseudo-code for bi-partite graph decomposition [11] 


Input/output variables 


Function 


Relation (types: input-function, function-function, function-output) 


Fig. 2. Function structure example representation [12] 


Following the definition of the function structure graph, a bi-partite graph (see Fig. 3) 
is used as the basis for decomposition. This graph is composed by left and right hand 
nodes, which are the entities and constraints respectively. The connection lines be- 
tween them are the relations derived from the function structure graph. In order for 
the final coupling complexity score to be reached, the bi-partite graph is decomposed, 
to its fullest extent, into several sub-graphs. Record of the complexity score is kept 
through the iterative decomposition of equation (1). The index number of the iteration 
step is the level (1), the minimum number of relations to be removed for a separated 
sub-graph to be had is the size (s) and the actual number of removed entities is the 
number (b). 


i=1 are ln * Sy * bn (1) 
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Variables/Functions 
Constraints 


remove Size=1 relations resulting sub-graph 


Relations 
Fig. 3. Bi-partite decomposition method [12] 


The coupling complexity of a product design is an aspect of complexity in a design. 
There are different representations available in literature, besides the various algo- 
rithms that can be used for the performance of graph analysis. The method selected 
for this study relies on the functions involved in the user-product interaction. Function 
decomposition does not take into account the relationship between the functions, 
through and input and output associativity, but provides a realistic evaluation of com- 
plexity, while remaining less representation-dependent, compared with other methods 
(e.g. size complexity). 


3 VR Design Method 


The VR method developed aims at measuring a product’s coupling complexity by 
monitoring the human-product interaction within an immersive virtual environment. 
The main philosophy of this development in VR is to enable the human user to per- 
form all natural operations and procedures with a product and at the same time to 
generate the function structure graphs to be used for the evaluation of the complexity 
of the product at hand. The VR method proposed uses an algorithm developed for the 
generation of the function structure graph, based on the human user’s motion and his 
interactions (i.e. collisions) with several elements of the virtual product. As depicted 
in Fig. 4, the architecture of the proposed VR method, implemented for the use-case, 
described in section 4 of this paper, uses a repository of the product elements in the 
virtual environment and of the tasks carried out by the human. These repositories are 
currently used for the generation of the function structure graph and can be replaced 
by semantic ontologies that will allow for further reasoning to be used and more 
complex function structure graphs to be generated in a future study. Human motion 
tracking is performed with 3D objects in the user’s virtual hand for the detection of 
collision with various elements of the product (cockpit). 


460 L. Rentzos et al. 


(> Pim 
- 


d 


coupling 
complexity 
score 


Fig. 4. Architecture of the proposed VR method 


The human task analysis (HTA) capabilities are brought about primarily with the 
user’s hierarchical categorization (pilot) inside the virtual environment (cockpit). 
The input is the task to be performed by the user (e.g. flight procedure performed by 
the pilot). The HTA module of the VR method starts by generating the function struc- 
ture graph, based on the elements, which are stored in a repository in the form of an 
array and the user interacts with. Each component of the product corresponds to a 
certain functionality. 

As far as the function structure graph implementation in VR is concerned, the first 
thing to be stated is the number and type of every variable to be included in the graph. 
The function structure graph has three types of variables namely, input, function, and 
output. Every value considering the graph generation is stored and handled in an 
array. The array has three corresponding rows, which the variables are stored in. Con- 
sidering the interactions that the user performs with the virtual environment, the rela- 
tions are stored in the array. Specifically, according to the users’ type of interaction, 
(hand, eye, camera tracking) the algorithm stores the appropriate types in the input 
row. The engine recognizes the users’ interaction (Boolean check, collision detection 
TRUE/FALSE, ray cast TRUE/FALSE) with the virtual environment and due to the 
fact that every element’s function is stored in a product element database, the engine 
stores the elements outcome in the output row. The connections are stored in a similar 
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manner. For example, after collision detection is made with an element, the engine 
registers the human hand in the first input cell, the human motion in the first function 
cell and the relation between them, in a format (cell, cell, 1) where the number 1, will 
be held for the relation statement. The number increases after the first element of the 
connection is used again. The engine’s configuration is to avoid duplication of the 
input variable. For example, the users’ hand variable and the human motion should 
exist only once, and only after the interaction type is stated as collision detection per- 
formed with some kind of human motion. At the same time, the HTA module updates 
the human tasks repository that is based on the tasks/actions performed by the user in 
the environment. The human tasks are stored in an array and act as the relations be- 
tween the human user and the product. A key logger function is able to distinguish 
and keep track of the user’s every element of interaction, in the virtual environment. 
In addition, the human task repository is also updated for further task evaluation. An 
array keeps the stored product elements that the user must interact with in order to 
perform a distinct task hierarchically. In case of error, the user is virtually notified, of 
the right element to interact with, or in what manner, considering the value set or the 
kinematic of the element. 

After the function structure graph has been generated, the algorithm extracts the 
bipartite diagram and starts its decomposition. The function structure graph and the 
bipartite graph are used as the basis for the decomposition algorithm implementation. 
The first thing to be examined is the way that the function structure graph is generated 
and in what degree of detail. In order for the complexity results to be accurately com- 
pared, the function structure graphs need to be identical.. After the relations between 
the variables have been stored in the relation row, the next step is to translate the 
coupling complexity measure algorithm, proposed for graph decomposition in an 
array handling engine. The algorithm is transformed accordingly so as to handle the 
arrayed data. Firstly, the third row of the existing table should be reformed into one 
unique table for the better handling of its elements. The new table comprising three 
rows should have the address of the first cell in the first row, in the second one, that of 
the second cell and in the third row the connection number. The implementation de- 
scribed above follows the pseudo-code, presented in section 2 of this paper. 


4 Aerospace Use-Case: Aircraft Cockpit 


A realistic use-case aerospace industry, specifically that of an aircraft cockpit design, 
is used for the demonstration of the applicability and value of the VR method devel- 
oped. Aircraft cockpits are highly complex products with a huge degree of human 
interaction during all operating conditions. The proposed VR method offers an easy to 
use way of evaluating the complexity of a cockpit design by performing flight proce- 
dures in a virtual environment. The use-case presented is based on a simple procedure 
so as to extract the necessary data for the evaluation of complexity. The user, in the 
virtual environment, interacts with the cockpit in order to perform the procedure set. 
While performing each task, the user is monitored by the VR algorithm, described in 
the previous section. 
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[System] [Value/State] 
Land Lights Retract 
Spoilers Disarm 
Flaps Retract 
Eng Mode Norm 
ATC Stby/off 
TCAS Stby 
Anti-ice Off 

APU Start 
Radar Off 


Windshear System Off 
Brake temperature 


Fig. 5. Actions performed in cockpit during the “After landing” procedure [13] 


The procedure selected for this use-case is an “AFTER LANDING” procedure 
from the Flight Crew Operations Manual of a commercial airliner. The “AFTER 
LANDING?” procedure is an eleven-task (11) procedure included in the standard op- 
erating procedures and is immediately performed after landing (see Fig. 5). It should 
be mentioned that the procedure was selected among others, due to the high number 
of the pilots’ interactions with several physical objects and the low need for their 
communicating with air traffic control (ATC). The user during the procedure needs to 
interact with two levers, three toggle switches, three rotation knobs and one display. 
The user is expected to interact with the elements in the predefined order and set the 
necessary values or states. In cases indicated by the task that the user has to interact 
with a display or checklist, it is considered as the human user is the output variable. 

After the procedure has been carried out, the VR method generates the function 
structure and the corresponding bipartite graph as depicted in Fig. 6. For this particu- 
lar procedure, the graph consists of two input variables, twelve function variables and 
eleven output variables. The coupling complexity algorithm yields a score of 46 for 
this use-case (level=1, size=1, number=2 and level=2, size=2, number 11). 
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Fig. 6. Function structure graph (left) and bi-partite graph (right) for “after landing” procedure 


5 Conclusions 


In this paper a VR method for evaluating complex product design is proposed. It 
enables the evaluation of the complexity by executing the tasks in a natural way. This 
paper aims at discussing the development and usage of a complexity calculator, in a 
virtual environment, in order to support the fast and efficient development of the early 
design product phases. The proposed VR method is implemented on a realistic aero- 
space use-case. The human user performs a normal flight procedure, in the virtual 
environment, whilst the tool can calculate the coupling complexity of this particular 
procedure. 

Future study and further enhancement of the VR method presented will consider 
any additional complexity measurement techniques, used inside a virtual environ- 
ment, in order for more aspects of product complexity to be evaluated under a single 
metric. In addition, a semantic implementation of this VR method will allow for ad- 
vanced reasoning capabilities, during the human task analysis, and will provide the 
means for increasing the level of detail for the evaluation of complexity. 
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Abstract. The paper summarizes experiences from applied research in visual 
computing for the maritime sector. It starts with initial remarks on Augmented 
Reality in general and the specific boundary conditions of the maritime indus- 
try. The focus is on a presentation of various concrete AR applications that have 
been implemented for use cases in maritime engineering, production, operation 
and retrofitting. The paper closes with remarks on future research in this area. 


Keywords: Augmented Reality, Mixed Reality, Applied Research, Maritime 
Industry, Mobile Systems. 


1 Introduction and Motivation 


In this first section, we will give a short overview of Augmented Reality (AR) sys- 
tems as well as the maritime sector that forms the application background of our ap- 
plied research work. 


1.1 AR Building Blocks 


Augmented Reality can be understood as the confluence of computer graphics and 
computer vision. Azuma characterizes an AR system by the following three aspects 
[1]: AR (1) combines real and virtual objects in a real environment, (2) runs interac- 
tively and in real time, and finally (3) registers/aligns real and virtual objects with 
each other. If we translate this characterization from an end user’s view to the view of 
the developer of an AR system, Azuma has identified three major building blocks [2] 
that are based upon a selection of basic technologies: 


e Sensing/Tracking: Determine the position and orientation of the head or a mobile 
device and follow in real time. This is often done by fusion of various sensors such 
as cameras, gyroscopic and/or acceleration sensors. 

e Registration: Derive real world coordinates as a prerequisite to mix real and virtual 
objects. 

e Augmentation: blending real and virtual images and graphics. This compasses the 
display and/or visualization techniques. 


R. Shumaker and S. Lackey (Eds.): VAMR 2014, Part II, LNCS 8526, pp. 465 2014. 
© Springer International Publishing Switzerland 2014 
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Due to the enormous boost in (mobile) graphics and the upcoming commercial avail- 
ability of ergonomic AR devices such as the Google Glass, AR has gained outstand- 
ing attention — from the research community as well as from the general public. 


1.22 Maritime Context 


Implementing visual computing technology in the maritime domain differs in various 
ways from applications in sectors such as automotive, military, medical or cultural 
heritage. The last mentioned sectors offer the typical application field for prototyping 
of new interactive IT applications. The high R&D spending of medical, automotive 
and military companies on the one hand and sectorial funding schemes for cultural 
heritage especially by the European Commission and other public funding bodies can 
explain this situation. 

Compared to those other sectors, the maritime forms a niche market which is typi- 
cally characterized by relatively low R&D intensity, many small and medium-sized 
companies, and a conservative attitude [3]. However, the awareness of the maritime 
industry as an important economy is constantly rising. Individual sectors cover a 
broad range from cruise tourism over offshore oil and gas to fishery and seaborne 
transport. This can be underpinned by the following quote from the European Com- 
mission in their Blue Growth strategy [4]: “If we count all economic activities that 
depend on the sea, then the EU's blue economy represents 5.4 million jobs and a gross 
added value of just under €500 billion per year. In all, 75% of Europe’s external trade 


and 37% of trade within the EU is seaborne.” 

Beside those economical differences, there are also various technical differences 
we have to cope with when implementing IT solutions for the maritime sector. Those 
bounding conditions are incompletely described as follows: 


e We typically find harsh environments where mobile applications are faced with 
water and pressure (in underwater settings), dust and flying sparks (shipyards), 
splash water, extreme temperatures and heavy movements due to waves (aboard a 
ship). 

e For underwater applications, we need to consider the specific physical effects such 
as refraction, deflection and attenuation that affect optical tracking as well as the 
registration process and visualization [5]. Color correction and distortion correction 
must be integrated into the AR solutions. 

e Connectivity to networks such as a global navigation satellite system (e.g. GPS) or 
wireless data networks (3G) is sometimes limited or completely shielded on open 
sea and under water. Satcom networks are much more expensive and acoustic un- 
derwater communication is offering only a quite small bandwidth and high delay. 

e A ship is not fixed but will roll, pitch and yaw. So — depending on the use case — 
we have to determine the movement of the ship relative to the earth and the move- 
ment of a person (or device) relative to the ship. 

e The dockyard hall as well as the outer hull of a ship and even internal structures 
such as pipes and frames are quite uniform and offer quite few characteristic visual 
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features. When we just see a detail of the overall structure, it is often impossible to 
identify the correct position of the clipping. 

e Ships can be quite big — not only in terms of physical dimension but also in terms 
of data volume. We can roughly estimate that a large ship has 10 times more parts 
than a plane and 100 times more parts than a car. 

e Furthermore, there are some economical or organizational specialties to mention: 


— Ships or offshore installations are not built in large volumes but as unique copies or 
a small series of two to five ships. The economy of scales is hard to reach here. 

— The design systems used in the maritime industry are very production-focused. 
There are various CAD systems that are only used in the maritime industry, maybe 
in plant design, but nowhere else. The native APIs and data formats demand a very 
specific know-how when interfacing with this IT environment. 


1.3. Related Work 


Plenty of work has been done in various applications fields of AR (ref. Fig. 1). Most 
of the publications present research in medical training or assistance [6, 7, 8], the 
automotive sector [9], aerospace industry [9, 10]. Not as widespread but also visible 
in the research community is work in ambient assisted living [11], architecture and 
civil engineering [12]. A new trend is AR in production in the context of Industry 4.0 
and Cyber Physical Systems [13, 14]. 
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Fig. 1. Number of scientific publications with “Augmented Reality” or “Mixed Reality” 
in article title, abstract or keywords. (Source: Scopus; generated February 2014) 


Those publications show that a large number of use cases has been addressed by (pro- 
totypic) AR systems. However, it is nearly impossible to easily transfer those systems 
to other sectors. All those solutions are designed to support a very limited task and do 
not offer the flexibility to adapt to similar settings. Most of the applications need a 
controlled environment in respect to lighting conditions or fast movements. And even 
their limited scope is the result of a time-consuming development process. So we can 
observe that there are just very few commercial applications of AR outside R&D. 
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From those findings we can argue that it is still necessary to concentrate on a particu- 
lar domain to develop AR systems that fulfill the needs of a certain community with 
their bounding conditions. 

There is a small number of publications that explicitly address maritime use cases 
of Augmented Reality. Among them we find navigation and watchkeeping and naval 
warfare [15-19], maintenance of ships [20], diver support [21], ship production [22], 
and two surveys [23,24]. 


y Our Examples 


The following Augmented Reality applications in different phases of the lifecycle 
have been developed by Fraunhofer IGD so far: 


2.1 Sales and Marketing 


Use Case. Especially in the maritime domain, AR still is an eye catcher at trade fairs 
or show rooms. Such an application neither has to be integrated into the existing IT 
environment nor does it has to support a complex process. For this reason, projects 
with sales and marketing often serve as a “door opener” to introduce Virtual or Aug- 
mented Reality in a company. In our project, a tablet PC is used to support the sales 
process for the retrofitting of ships. The tablet serves as a window to the “future” and 
shows the customer what his or her ship could look like with a new generation of 
rescue boats and davits. It is designed not for the real world, so we used large posters 
to visualize a cruise liner. The end user can choose between different augmentations: 
the new davit plus rescue boat in parking position, in swing-out position and with 
additional dimensions. 
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Fig. 2. AR application for the retrofitting of rescue boats and davits 


Challenges. There have not been any specific challenges of the maritime sector due 
to the setting at a booth of a trade fair. The challenge here was to implement an appli- 
cation that is very robust, working with different lighting conditions and easy to use 
for the visitors without explanations. 
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Implementation Aspects. The application made use of Fraunhofer IGDs framework 
for mobile AR applications [25]. This speeds up the authoring phase and generates the 
content that will be read in by the player running on an iPhone or iPad and imple- 
menting markerless optical tracking — in our case with a poster tracker. 


2.2. Product validation 


Use Case. A kinematic simulation has been introduced in the design process to vali- 
date the model of a davit. To support confidence in the simulation, we implemented 
an AR application where the simulation was used as an overlay to real video of a 
physical acceptance trial. An animated wireframe model of the davit is rendered into 
the real video with the correct viewpoint [26]. 


Challenges. The challenging part of this project has been the poor video quality of the 
test trial and the fact that the small partnering company did not have much experience 
with 3D CAD, simulation and data exchange. These two challenges are not unique to 
the maritime sector but quite typical from our experience. 


Implementation Aspects. We used a marker for a first pose estimation and then util- 
ized the available CAD model for fine registration. The visualization was imple- 
mented with our VR/AR framework instantReality [27]. 


(c) (d) 


Fig. 3. The processing pipeline: (a) Original video frame; (b) extracted visible edges from the 
CAD model; (c) Extracted edges from the image; (d) Augmentation with control and corres- 
ponding hit points 
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2.3 Design and Production 


Use Case. As already mentioned above, shipbuilding is characterized by a highly 
parallel process of design and production: Some parts of the ship are still in a very 
early design stage while other parts are already built. Similar requirements arise from 
late changes that can be demanded by the customer or stem from other disruptions in 
the process such as wrong dimension of delivered supply parts or significant devia- 
tions in production steps such as welding. 

A specific use case is the design of pipes in the context of a bundle of existing 
pipes. Fitting pipes can be designed in physical context using an AR application 
where start and endpoint of the pipe can be selected [28]. 


Fig. 4. Mobile design application for pipe design in the physical context 


Challenges. In this production-oriented use case, we have had to fulfill very high 
accuracy demands of the customer. The system is intended to send the design parame- 
ters of the pipes directly to the production department without additional post- 
processing. 


Implementation Aspects. Without additional aids, we could not reach the necessary 
accuracy. A measurement tool with two illuminating points was used to align virtual 
straight pipe segments in such a way that they fit exactly through existing bolt holes. 


2.4 Harbour Surveillance 


Use Case. For harbor surveillance, the operator today typically has to switch between 
map view with radar and AIS information of ship tracks on the one hand and different 
video cameras on the other hand. Our objective with introducing AR in context of the 
Seatrax project is to bring both worlds together: For example the operator can look at 
the video system to instantly get all information about a vessel by touching on the 
ship, or she can start a radio call instantly. 
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This video mode offers the classical camera view — but augmented with annotation 
on every vessel with its name and optional meta data from the Vessel Tracking Sys- 
tem (VTS). Using the follow-me function, the operator can stick the camera to a ship, 
and the camera automatically follows the ship. Also buoys, lights, navigation aids and 
sea marks are drawn as overlay into the video stream on the right position in the right 
scale and correct perspective. 

Besides the AR mode, the system should offer a second mode to the operator: the 
virtual model mode. This mode presents a nautical map and/or a static 3D harbor 
model augmented with dynamic ship models as representation of vessels in the real 
world. Parametric VR ship models for different AIS ship types (e.g. fishing, towing, 
tug, pilot) will be adapted in length and breadth gained from the VTS. In this mode, 
the bridge view function allows to “jump” to any ship’s bridge. In this way, the opera- 
tor has the same viewpoint as the captain and can check sight conditions. The other 
function in the virtual model mode is the bird’s view function to get an overview. 


Fig. 5. Augmented harbor scene: (left) real video mode, (right) virtual model mode 


Challenges. There have been three big challenges in the project: Firstly, the camera 
mechanic. For controlling the surveillance camera of Funkwerk, we used the 
MULTISEC protocol. Whereas the pan and tilt of the camera housing of Funkwerk is 
very precise (tenth degree), the built-in zoom mechanic is very slippery. The lens 
system has a hysteresis, so that the same input value leads to a different zoom factor 
depending on the zooming direction. This was solved by introducing discrete zoom 
steps. 

Secondly, the correction of the lens distortion. The radial distortion of the lens is 
minimal. But the optical center (also known as principal point) is not in the center of 
the lens as assumed by most synthetic camera models. This is a big problem because 
we need to precisely synchronize the real camera with a virtual one. The camera mod- 
el of the VRML standard does not regard this parameter. Our AR framework instant 
Reality [33] allows to set this parameter and provides an extended camera model on 
top of the VRML standard. 

Thirdly, we had to cope with the delay given by the irregularity period of the AIS 
signals (10 sec. up to 10 min.). Ships’ AIS transmitters send AIS signals depending on 
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different circumstances, e.g. their speed over ground, the current rate of turn, the 
availability of free time slots at the broadcast frequency. Assuming that a ship has a 
big inertia of mass, we implemented a simple linear interpolation. As an alternative 
position source, the system is able to connect with VTS to make use of the exact and 
frequently updated position derived from radar information. 


Implementation Aspects. We developed a C# application connected with the AR 
framework instantReality via its’ EAI (External Authoring Interface) and different 
connectors to the camera, to AIS and VTS as well as to an AIS simulation. The solu- 
tion consequently uses a 3D world to correctly place and scale the sign posts with the 
ship name and additional information. The user can control the cameras via pan, tilt, 
zoom and focus, and the camera parameters of the augmentation will always mimic 
the current settings of the real camera. Pan, tilt, zoom and focus commands will be 
propagated to both the real camera and the virtual camera synchronously. 


3 Discussion 


Compared to AR applications in other areas, the maritime sector obviously raises 
some additional challenges. First of all, we have to find a solution for registration and 
tracking. The ship hull or the water shields GPS signals, so we have to rely on alterna- 
tive technologies for an initial position. For practical reasons, optical tracking should 
not be based on markers but use implicit features. However, in many areas of the ship 
we find very similar objects such as long pipes or steel plates. So we need additional 
sensors and adequate sensor fusion to determine the user’s pose. Outside the shipyard, 
we have a ship with a 6 DOF movement which makes the tracking problem even 
harder. 

In sectors with mass products, we have high-quality 3D models from the design 
phase that are nearly 100% equivalent to the physical product. The digital 3D model 
of a ship from the engineering department will typically not cover all details or late 
changes due to complications in production or varying supplier parts. Furthermore, 
the virtual ship and the real ship will depart even more in operation with every part 
replaced during repair and overhaul. But without a correct 3D model, many AR appli- 
cations are hard to implement in a robust way. 

Mobile AR systems with integrated optical 3D reconstruction have the potential 
to replace the expensive process of laser scanning in some areas: e.g. they can be used 
to collect all the geometric information for detailed planning of a retrofit project for 
ballast water treatment. 

Future AR applications also address underwater scenarios such as diver assis- 
tance for control and repair operations at the rudder or propeller. For those kinds of 
applications, we have to solve all the physical challenges of underwater settings such 
as optical refraction, challenging light situations and marine snow resulting in ex- 
treme noisy images. 
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4 Summary and Outlook 


Augmented Reality is a promising approach to support users in many different situa- 
tions. Specific AR applications for maritime use cases have to cope with a bunch of 
technical and economic challenges. Those challenges do not only affect the feasibility 
but also the business case for commercial usage. However, the increasing importance 
of the sea as the backbone of worldwide transport, the premier location for regenera- 
tive energy, a supplier of food for a growing world population, motivates (applied) 
research in this area. 

In this article, we have presented several examples of AR applications that have 
been developed for end users in different sectors of the maritime industry. We have 
demonstrated that we can already find technical solutions for most of the difficult 
bounding conditions. 

But there is still a long way to go: Additional research is necessary to prepare the 
ground for a further dispersion of AR in all sectors of maritime industry. The follow- 
ing topics are on our R&D roadmap: 

Efficient authoring of AR content is a pre-condition for stakeholders in a market 
that is characterized by SME and the lack of economies of scale. Our tools that rely 
on available material such as manuals [25] or CAD data [29] are a first step in this 
direction. 

Robust optical tracking — even under very poor conditions (e.g. underwater) — 
needs different steps of correcting the different physical effects such as color cast [30] 
or noise that would hinder a feature correspondence algorithm to work correctly. 

Precise inship tracking is an important building block for nearly all AR use cases 
during operation of a ship. Here, we rely on fusion of various sensors with promising 
first results [31]. 

Cyber physical equivalence is our placeholder for a bundle of technologies that are 
needed within the context of Augmented Shipbuilding. It is about a continuous mutual 
alignment of the virtual world and the real world. A first small example of a 3D dis- 
crepancy check via AR in the outfitting phase was demonstrated successfully [32] but 
has to be extended to much larger volumes, must be seamlessly integrated into the 
environment and must be more flexible to be adaptive to similar scenarios. 
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